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Regarding the Claim Objections 

Claims 22 to 24 stand objected to due to the dependency of the claims being incorrect. 
As set forth herein, claims 22 to 24 have been amended to provide the correct claim dependency. 
As such, withdrawal of the objection is respectfully requested. 

I. DOUBLE PATENTING REJECTION 

The rejection of claims 25 to 28 is respectfully traversed. The Examiner indicates that 
these claims are substantial duplicates of claims 6 to 9. 

As set forth herein, claims 25 to 28 have been amended to depend from claim 21 thereby 
providing the correct claim dependency. Thus, amended claims 25 to 28 are not substantial 
duplicates of claims 6 to 9 and, as such, withdrawal of the rejection is respectfully requested. 

II. REJECTION UNDER 35 U.S.C. §112 

The rejection of claims 1 to 9 and 21 to 28 under 35 U.S.C. §112, first paragraph, as 
allegedly lacking an adequate written description, is respectfully traversed. The Examiner states 
that "given the enormous scope and divergent nature of the subject matter encompassed by the 
Factor IX of these claims, these examples fail to adequately represent the full genus." [page 6, 
first paragraph of the Office Action] Vas-Cath Inc. v. Mahurakar (935 F.2d 1555 (Fed. Cir. 
1991)), Fiers v. Revel (984 F.2d 1 164 (Fed. Cir. 1993)) and Reagents of the Univ. Calif, v. Eli 
Lilly (119 F.3d 1559 (Fed. Cir. 1997)) are cited in support of the rejection. 

Claims 1 to 9 and 21 to 28 are adequately described. To provide an adequate written 
description for gene sequences "requires more than a mere statement that it is part of the 
invention . . . .what is required is a description of the DN A itself." Fiers at 1 1 70-7 1 . In Reagents 
of the Univ. Calif v. Eli Lilly, the Federal Circuit explained that "description of a genus of 
cDNAs may be achieved by means of a recitation of a representative number of cDNAs, defined 
by nucleotide sequence, falling within the scope of the genus or of a recitation of structural 
features common to the members of the genus, which constitute a substantial portion of the 
genus." Id. at 1568. Although the Lilly court did not specify how many species constitutes a 
representative number, "Applicants are not required to disclose every species encompassed by 
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their claims, even in an unpredictable art." In re Angstadt, 537 F.2d, 498, 502-503 (CCPA 
1976). 

Here, the specification discloses a representative number of Factor IX species to 
adequately describe a Factor IX genus. Furthermore, knowledge in the art regarding Factor IX 
structure and function, as well as critical and non-critical amino acid sequences for Factor IX 
function, was extensive at the time of the invention. 

First, the Examiner acknowledges that the specification discloses five mammalian 
Factor IX species, human, mouse, canine, rabbit and bovine. The Examiner also acknowledges 
that the specification discloses a Factor IX variant having function, [see, page 6, first paragraph 
of the Office Action] Thus, the specification discloses a total of six different mammalian 
Factor IX species. 

Second, knowledge in the art regarding the structure and function of Factor IX was 
extensive at the time of the invention. To corroborate Applicant's position, submitted herewith 
as Exhibits A to D are references by Sarkar et al ( Genomics 6:133 (1990)); Kurachi et al 
( Blood Coagulation and Fibrinolysis 4:953 (1993)); Bottema et al ( Am. J. Hum. Genet. 49:820 
(1991)), and Giannelli et al ( Nucleic Acids Res. 20:2027 (1992)). 

In Exhibit A, the authors describe sequencing the activation peptide and catalytic domain 
of Factor IX in six species, sheep, pig, rabbit, guinea pig, rat and mouse. Thus, mammalian 
Factor IX sequences, in addition to those referred to in the specification, were known in the art at 
the time of the invention. 

In Exhibit B, the detailed structure and function of Factor IX regions is described 
(pages 954-958). In particular, for example, Factor IX domains are illustrated in Fig. 2 
(page 955, catalytic, GLA, EGF domains, etc.), conserved regions described (propeptide, 
page 955, second column), domains important for function described (Gal domain Ca ++ binding, 
page 956, paragraph bridging the first and second columns; EGF domain, page 957, conclusion 
of the first column; catalytic subunit, page 958, first column, first full paragraph), and sequences 
that do not appear critical for function described (spacer sequences which comprise 60% of the 
Factor IX sequence, paragraph bridging pages 957 and 958, See, also, page 958, paragraph 
bridging the first and second columns). 
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Third, variant Factor IX sequences having function or having reduced function would 
have been known to those skilled in the art based upon knowledge in the art at the time of the 
invention. Again, Exhibit B indicates Factor IX amino acids that are not critical for function. For 
example, a large number of Factor IX amino acid residues are spacer sequences, which comprise 
60% of the Factor IX sequence, and are replaceable with most other amino acid residues without 
resulting in haemophilia B (paragraph bridging pages 957 and 958, See, also, page 958, 
paragraph bridging the first and second columns). In addition, Exhibit B indicates that the first 
EGF domain of Factor X can replace that of Factor IX without destroying function (page 957, 
second column, first full paragraph). 

In Exhibit C, the authors report 3 1 point mutations in Factor IX (see abstract and 
Table 1). 95 missense mutations are also identified, which occur at evolutionary conserved 
amino acids (see abstract). The authors then compared sequences from four and nine species of 
Factor IX and found that in 40% of the residues, virtually any missense mutation in a minority of 
residues would cause disease, while in the remaining residues, no missense mutations will cause 
disease (see, for example, abstract and page 829, second column). As in Exhibit B, 60% of the 
residues in Factor IX are identified as spacers (see abstract, Exhibit B). 

Thus, as corroborated by Exhibits B and C, those skilled in the art would have known 
which sequences of Factor IX that could be modified without destroying function. 
Consequently, those skilled in the art would also have known how to produce variant Factor IX 
sequences having function. 

Exhibit B also indicates Factor IX amino acid residues critical for function. In this 
regard, at least 278 unique Factor IX variant sequences having severely reduced function were 
known in the art in 1992 (page 960, paragraph bridging the first and second columns). Of these, 
at least 29 were complete or partial gene deletions, 50 short (less than 20 nucleotides) deletions 
or insertions, and a large number of single-base missense or non-sense mutations (page 960, 
second column). As to specific residues critical for function, for example, replacing Glu27 
w/Val, disturbs Ca++ binding (page 956, second column); replacing Asp47 w/Gly or Pro55 
w/Ala severely reduces factor IX function (page 957, bottom of first column); replacing Cysl32 
w/Arg or Arg 145 w/Cys or His (page 957, bottom of second column); replacing Pro 287 w/ Leu, 
Ala 291 w/Pro, or Thr 296 w/Met (page 958, first column, first full paragraph). 
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Exhibit D lists a database of all known Factor IX point mutations, additions and deletions 
that cause haemophilia B. A total of 574 patient entries are described and of these 278 are 
unique molecular events (see abstract). 

Thus, in view of the large number of Factor IX mutations and their functional 
consequences known in the art at the time of the invention, as corroborated by Exhibits B to D, 
those skilled in the art would have known sequences of Factor IX that could not be modified 
without destroying or significantly impairing function. Consequently, those skilled in the art 
would have known the Factor IX variants that would not be desirable for optimal Factor IX 
function. 

Fourth, the cited Fiers v. Revel and Reagents of the Univ. Calif v. Eli Lilly case law is 
also clearly distinguishable form the subject application. For example, the five mammalian 
Factor IX species, exemplary Factor IX variant disclosed in the subject application, and the 
Factor IX sequence regions known in the art (as corroborated by Exhibit A) is in dramatic 
contrast to the number of disclosed species at issue in Fiers v. Revel In Fiers, Revel attempted 
to claim priority to an Israel patent application that did not disclose any DNA sequence. In Lilly 
the patent-in-suit disclosed a single rat insulin cDNA, which the court held did not provide an 
adequate written description for generic claims directed to cDNA encoding vertebrate and 
mammalian insulin. Thus, the subject application, which discloses five Factor IX mammalian 
species and a functional Factor IX variant, as well as Exhibit A, which indicates additional 
Factor IX sequences were known in the art, is clearly distinguishable from both Fiers and Lily. 

Moreover, there was little if any knowledge in the art regarding the claimed gene/cDNA 
sequences at issue in both Fiers and Lily. In stark contrast, knowledge in the art regarding 
Factor IX sequences as well as structure and function was extensive, as corroborated by 
Exhibits A to D discussed above. 

Thus, in view of the fact that at least five mammalian Factor IX sequences were known in 
the art at the time of the invention, that the structural and functional domains of mammalian 
Factor IX was known, and that critical and non-critical Factor IX amino acid sequences and 
regions were known in the art, as corroborated by Exhibits A to D, those skilled in the art would 
have known which Factor IX amino acid residues could be modified without destroying function, 
and also would have known that altering particular amino acid residues would destroy or 
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substantially impair Factor IX function. Consequently, those skilled in the art would know 
numerous functional Factor IX sequences for use in the claimed compositions and, as such, 
would be apprised of the Factor IX genus. Accordingly, an adequate written description for the 
Factor IX genus is provided and, therefore, Applicants respectfully request that the rejection 
under 35 U.S.C. §112, first paragraph be withdrawn. 
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CONCLUSION 



In summary, for the reasons set forth herein, Applicants maintain that claims 1 to 9 and 
21 to 28 clearly and patentably define the invention, respectfully request that the Examiner 
reconsider the various grounds set forth in the Office Action, and respectfully request the 
allowance of the claims which are now pending. 

If the Examiner would like to discuss any of the issues raised in the Office Action, 
Applicant's representative can be reached at (858) 509-4065. 

Please charge any additional fees, or make any credits, to Deposit Account No. 03-3975. 



PILLSBURY WINTHROP LLP 
11682 El Camino Real 
Suite 200 

San Diego, CA 92130 
(858)509-4065 Telephone 
(858) 509-4010 Facsimile 



Respectfully submitted, 



Date: 





Robert M. Bedgood, Ph.D. 
Reg. No. 43,488 
Agent for Applicant 
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Direct Sequencing of the Activation Peptide and the Catalytic 
Domain of the Factor IX Gene in Six Species 

G. Sarkar, D. D. Koeberl, and S. S. Sommer 1 

Department of Biochemistry and Molecular Biology, Mayo Clinic/Foundation, Rochester, Minnesota 55905 

Received July 19, 1989; rev.sed September 13, 1989 



By means of RNA amplification with transcript se- 
quencing (RAWTS) under low stringency conditions, 
sequence was obtained directly without cloning for the 
activation peptide and the catalytic domain of factor 
IX from six species — sheep, pig, rabbit, guinea pig, 
rat, and mouse. The data presented demonstrate that, 
by the appropriate design of oligonucleotides and by 
performance of a nested PCR under appropriate con- 
ditions, it is possible to obtain sequence on a battery 
of species with a minimum of oligonucleotide primers. 
A total of 5.2 kb of cross-species sequence was gen- 
erated with RAWTS. The results indicate that (1) 69% 
of the amino acids in the catalytic domain, but only 
of the amino acids in the activation peptide, are 
identical in humans and the six species; (2) the cata- 
lytic domain evolves at a slower rate, but the extent 
mid pattern of conservation of amino acids in the ac- 
Livation peptide suggest that the peptide functions as 
more than a cleavage spacer that separates the heavy 
and light chains in the catalytically inactive zymogen; 
(3) 37% of the amino acids in the activation peptide 
\nd 34% of the amino acids in the catalytic domain 
are factor IX-specific; i.e., they are either identical or 
changed in a highly conservative fashion in factor IX, 
but not in other related coagulation proteases; (4) 
these conserved factor IX-specific amino acids fall into 
three clusters, which are candidates for involvement 
in the protein interactions specific to factor IX; (5) 
there is a human-specific deletion after lysine 142 and 
a rodent-specific insertion after alanine 161; (6) in 
guinea pig, the insertion is associated with a seven- 
amino-acid repeat that corresponds to a perfect repeat 
of a 2 1 -bp sequence; (7) humans have lost a potential 
N-glycosy lation site that is conserved in the other spe- 
cies; (8) in each species, a few nonconservative changes 
occur in amino acids that are otherwise completely 

Sequenre data from this article have been deposited with the 
KMBf ./(ienHank Data Libraries under Accession Nos. M'Jfi'J^S. 
MJH_>:U, M'J6'J36. and M23247. 

' To whom reprint requests should be addressed. 



conserved, suggesting that compensatory mutations 
may have occurred; and (9) when compared to that of 
mouse, the amino acid identity with guinea pig factor 
IX is no greater than that found for the non-rodent 
species, a result compatible with the postulated in- 
creased rate of evolution in rodents. <z iayo Academic 

Press, Inc. 



INTRODUCTION 

Human factor IX is an activatable serine coagulation 
protease that is encoded by a 34-kb gene on the X 
chromosome (Yoshitake et al., 1985). Factor IX cir- 
culates in the plasma as a single-chain, vitamin In- 
dependent zymogen of 415 amino acids that contains 
17% carbohydrate by weight. During clotting, factor 
IX may be activated by factor XIa in the presence of 
calcium and by factor Vila in the presence of calcium 
and tissue factor (reviewed in Furie and Furie, 1988k 
Proteolysis by either factor XIa or factor Vila releases 
a 35-amino-acid activation peptide. The primary role 
of activated factor IX in clotting is to activate factor 
X. The physiological complex includes factor IXa, cal- 
cium, phospholipid, and thrombin-activated factor 
VIII. Factor IXa also binds to anti-thrombin III and 
it can activate factor VII in the presence of calcium 
and phospholipid (DiScipio et al, 1978; Masys et al, 
1982). 

DNA sequence for the coding region of factor IX is 
available for humans (Kurachi and Davie, 1982; Jaye 
et ai, 1983; Anson et al, 1984; Jagadeeswaran et al, 
1984; McGraw et al, 1985) and the amino acid sequence 
is available for the circulating zymogen from cow (Ka- 
tayamaeta/., 1979). If additional sequences were avail- 
able, it would be possible to better define the conserved 
amino acids. A subset of these amino acids will be ge- 
neric in the sense that they are also conserved in factor 
VII, factor X, and protein C— -a group of related co- 
agulation proteases that have the same domains and 
identical cxonic structures (Furie and Furie, 1988). The 
remainder of the conserved amino acids will be unique 
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to factor IX. These will be candidates for specific in- 
teractions of factor IX with factors VII, VIII, X, XL 
and anti-thrombin III. 

ZooRAWTS is a technique that in principle allows 
the sequence of homologs in multiple species to be ob- 
tained rapidly without the need for cloning (Sarkar 
and Sommer, 1989). Here we demonstrate more exten- 
sively the feasibility of ZooRAWTS by obtaining over 
f> kb of DNA sequence from the activation and catalytic 
domains of the factor IX from six species. The data 
deline amino acids that are specific for factor IX. The 
catalytic domain of factor IX is found to be highly con- 
served and the activation domain also has a substantial 
fract ion of conserved amino acids, strongly suggesting 
that it functions as more than just an inactivating 
spacer between the heavy and light chains. 

MATERIALS AND METHODS 

Liver mRNAs were purchased from Clontech. 
RAWTS was performed as indicated below. 



1. First-strand cDNA synthesis: Twenty microliters 
of 50 /ig/ml heat-denatured total RNA or mKNA, 50 
mM Tns-HCl (pH 8.3), 8 mM magnesium chloride, 30 
mM KC1, 1 mM DTT, 2 mM each dATP, dCTP, dGTP, 
dTTP, 50 Mg/ml oligo(dT) 12-18, 1000 U/ml RNasin, 
and 1000 U/ml AMV reverse transcriptase were in- 
cubated at 42°C for 1 h, followed by 65°C for 10 min. 
Subsequently, 30 /ul of H 2 0 was added, generating a 
final volume of 50 ^1. 

2. PCR: The above sample (1 added to 40 pi 
of 50 mM KC1, 10 mM Tns-HCl (pH 8.3), 2.0-8.0 mM 
MgClo (empirically determined for each set of primers >, 
0.01% (w/v) gelatin. 200 M M each dNTP. 1 nM each 
primer ( Perkin-Elmer-Cetus protocol). After 10 min 
at 94°C\ 1 U of Taq polymerase was added and .SO- -40 
cycles of PCR were performed (denaturation: 1 min *;-.t 
94°C; annealing: 2 min at 50 C C; elongation: 3 min a.\ 
72 C C ) with the Perkin-Elmer-Cetus automated ther- 
mal cycler. One primer included a T7 or SP0 promote r 
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TABLE 1 

Mismatches Compatible and Incompatible with 
Obtaining Sequence 

Species and Sequence 
oligonucleotide Sequence differences'* obt ained 

I. OliD 

Human T 

Sheep A ^ 

l r W 

Rabbit C T < 

Guinea pig . G . . . . T . . . . T + 

Hat C + 

Mouse 5 ' GACTATCAAAATTCTACTCA 3 ' + 

II. OUF 

Human A ~*~ 

Sheep 3 ' TGCTGCATTCTGTGGA 3 ' 

Pig A C 

Rabbit AG. . G + 

Guinea pi^ AA r 

Rat AG + 

Mouse . AG f 

III. OliJ 

Human . G GC . C 

Sheep A CTC . 

Pig G C . . CG C 

Rabbit CG. t 

Guinea pig . G .... C ... C ... GA . . 

Rat + 

Mouse 5' ACATAGCTGTTTAGTA TTA 3 ' + 

0 The oligonucleotide sequence is shown and differences between 
the oligonucleotide sequence and the species sequence are indicated. 
OliJ is in the upstream direction so the sequence is complementary 
to the sequence in Fig. 3. 

h PC Ft was performed using the indicated oligonucleotide and either 
OliA or OliO. Less than 10 pg of the 900-bp T7--SP6 segment was 
the input DNA. It an amplification product of appropriate size was 
seen, it was eluted from an agarose gel, transcribed, and sequenced 
with the PCR oligonucleotide. 

as previously described (Stofiet et ai, 1988). After the 
last cycle of PCR, a final 10-min elongation was per- 
formed. 



3. Transcription: When a nested (internal) se- 
quencing primer was used, 3 id of the amplified material 
was added to 17 pd of the RNA transcription mixture. 
The final mixture contained 40 mA/ Tris- HC1 (pH 7.5), 
6 mM MgCL, 2 mA/ spermidine, 10 mM sodium chlo- 
ride, 0.5 mM of the four ribonucleoside triphosphates, 
RNasin (1.0 U/mD, 10 mM DTT, 10 U of T7 or SPG 
RNA polymerase, and diethylpyrocarbonate-t.reat.ed 
H 2 0. Samples were incubated for 1 h at 37°C and the 
reaction was stopped by freezing the sample. When the 
PCR primer was used as a sequencing primer, the PCR- 
amplified segment of appropriate size was eluted from 
an agarose gel before the onset of transcription. For 
segments under 400 bp, the "freeze-squeeze" method 
was used (Taute and Renz, 1983) and for segments 
over 400 bp, the GeneClean modification (Bio 101) of 
the "glass bead'' protocol was used. Note that the am- 
plification afforded by transcription obviates the need 
for quantitative elution during the gel purification. 

4. Sequencing protocol [modification of Geliebter 
(1987)]: The transcription reaction (2 id) was added to 
10 /il of annealing buffer containing the end-labeled 
reverse transcriptase primer. Annealing and sequencing 
were performed essentially as described (Stofiet et a/., 
1988), except that less reverse transcriptase (2.5 units/ 
reaction tube) was used. Note that [-y- 31! P]ATP is the 
correct donor for end-labeling. 



Nomenclature 

Since oligonucleotides accumulate rapidly when 
GAWTS is used, it is important to have informative 
names. The following nomenclature readily allows the 
determination of (i) the size of the amplified fragment, 
(li) the appropriateness of any combination of oligo- 
nucleotides, and (hi) the origin and direction of the 
sequence generated. It is of the form G(0)-(l-L)R(C>- 
SD, where G is gene abbreviation, O is organism, I is 
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FIG. 2. Sequence strategy tor mouse factor IX. The sequencing strateto" ^ the other species was either identical or there were minor 
[Inferences. The arrows indicate the direction of the sequencing. Both strands were sequenced tor all species. 
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identifier (one or more) for any noncomplementary 5' 
bases, L is length of the noncomplementary bases, R 
is region of the gene, C is location of the 5' comple- 
mentary base, S is total size, and D is 5' to 3 ' direction 
of the oligonucleotide. The region of the gene (R) is 
abbreviated by 5', the region upstream of the gene; by 
E followed by exon number; by I followed by intron 
number; or by 3', the region downstream of the gene. 
The direction of the oligonucleotide is either U, up- 
stream, or D, downstream- If a transcript has been de- 
fined, D is the sense direction and U is the antisense 
direction. Otherwise, the directions can be arbitrarily 



defined. Thus F9(Hs)-(T7/TI-37)E6(20365)-5lD is an 
oligonucleotide specific for the factor IX gene of Homo 
sapiens, which has a T7 promoter and a translation 
initiation sequence of 37 nucleotides (see Sarkar and 
Sommer, 1989, for sequence). It is complementary to 
a sequence in exon 6 that begins at base 20365 in the 
numbering system of Yoshitake et al. (1985). The oli- 
gonucleotide has a total of 51 nucleotides and it heads 
downstream relative to factor IX mRNA. As another 
example, F9(Mm-E8(30949)-19U is an oligonucleotide 
specific for mouse (Mus musculus) factor IX. It is com- 
plementary to a sequence in exon 8 of the mouse factor 



HUMAN (1 ,855) 

SHEEP(1,853) 

PtG(1,855) 

RABBIT(1,855) 

GUI NEA( 1,855) 

RAT(1,855) 

MOUS£( 1,855) 



HUMAN (1,855) 

SHEEP(1,853) 

PIG(1,855) 

RABBIT(1,855) 

GUINEA<1,855) 

RAT(1,855) 

MOUSE (1,855) 



HUMAN (1 ,855) 

SHEEP(1,853) 

PIGC1,855) 

RABBIT(1,855) 

GUINEA(1,855) 

RAT(1,855) 

MOUSE (1,855) 

consensus 



HUMAN ( 1,855) 

SHEEP(1,853) 

PIG(1 ( 855) 

RABBIT(1.855) 

GUINEA(1,855) 

RAT(1,855) 

MOUSE ( 1,855) 



HUMAN (1,855) 

SHEEP(1,853) 

PIG(1 ( 855) 

RA8BIT(1,855) 

GUINEA(1,855) 

RAT(1,B55) 

MOUSE (1,855) 

consensus 



irw 20v 30v 40v 50 v 60v 70v 80v 90v 100v 

AGAGTTTCTGTTTCACAAACTTCTAAGXXXCTCACCCGT^ 

=;=^^ 

"^iimXTTB^YAHRXTT^R^ 

110v 1Z0v 130v HOv 150v 160v 170v 180v 190v 200v 

ATGATA>TGJU , GATGAT6AAA£CAJ^ 
Tin^ATrj^ AACXXXAGC^ 

AT^XAXXXX^XAXGM 



270v 



280v 



290v 



300v 



2 1 Dv 220 v 230v 240v 250v 260v 

AGGTCAATTCCCTTGGCAGGTTGTTTTGAJVTGGTAAAGTTGATGCATTCTGTGGAGGCTCTATCGTTMTGAAAAATGGATTGTAACTGCTGCCCACTGT 

«GTC^ CCCT GGCAGGTCCTTC 

GGGTCAATTTCCGTGGCAGGTCCTTCTGAAT^ 

AGGTCMTTCCCTTGGCAGGT^ 

GGGTCAAATCCCTTGGCAGGTCATTTTAAATGGT6AA>TTGAGGCATTC^ 

" R GGTCAA WTCCCT TGGCAGGT CVTTYTRA>IGGT RAAR TTG>«)GCAITCJ^TGGAGGYIca ATCRT T AAlGAAAA>TMRIYGJA^DGCW«aA^Y 



370v 



380v 



390v 



400v 



31 0v 320v 330v 340v 350v 360 v 

GTTWAACTGCTGTTMAATTACAGTTCTCCCACCTG^^ 

ATCAAGCCTGGTGTTAAAATTACTGTTGTTGCAGGTGAACATAACACTGAGAAGCCAGAACCTACAGAGCAAAAGCGAAATGTGATCCGTGCTATTCCTT 
ATCWACCTGGTGTTAAAAT^ 

ATTCTCCCTGGTATTAAAATTGAGGTTGTTGCAGGTAAACATAMATTGAAAAGAAGGM 

CTTAAACCTGGTGATAAAATTWGGTTGTTGCTGCTGAAC^^^ 

CTTAAACCTGGTG>TAAAATTG>GGTTGTTGCTGGTGAATATAACAn 

" VTYVAVCCTGGTGUT^AM 



470v 



480v 



490v 



500v 



410v 420v 430v 44 Ov 450v 460v 

ACCACAACTACAATGCAGCTATTWTMGTACA^^ 
ACCACGGTTACAATGCATCTATTWTMGTACAG^ 

ACCACAGCTACA>TGCCACCGTGMT«GTACAGCCATGACATTCCCCTCCTGGAACTGGATGAACCCTTGACGCTGAACAGCTA 

ACCACAAATACMTGCAJKTATCAATWGTA^ 
ACCATAGTTACAATGCAAGCTTTAATAAATACAGCCACGACATTGCCC^ 
ATCACCAGTACAATGCAACTATTAATAAGTATAGTCATGACATTGCCTTG 
ATCACCAGTACAATGCAACTATTAATAAGTATAGT^ ^ 

AYcicVRXTAC^MGiAOCYDTBA^ 



FIG 3. Alignment of the nucleic acid sequences with human sequences. Nucleotides 40-174 represent the activation peptide. The absence 
of nucleotides in a species is indicated by X. In guinea pig, the 21-nucleotide in-frame repeat (82-102 and 112-132) that corresponds to the 
seven-amino-acid repeat is underlined. In the consensus sequence, the underlined nucleotides are identical in all species. The International 
Union of Biochemistry code is used for the nonidentica! nucleotides. 
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HUMAN (1,855) 

SHEEP<1,853) 

PIG(1,855) 

RABBI T( 1 ,855 ) 

GUINEA(1,855) 

RAT(1,855) 

MOUSE (1,855) 



HUMAN (1,855) 

SHEEP<1,853) 

PIC(1,855) 

RABBI 1(1,855) 

CUINEA(1,855) 

RAT<1,855) 

MOUSE< 1,855) 



HUMAN( 1,855) 

SHEEP(1,853) 

PIG(1,855) 

RAB8ITC1 ,855) 

GUINEA(1,855) 

RAT(1,855) 

HOUSE(1,855) 



HUMAN (1,855) 

SHEEP(1,853) 

PIG<1,855) 

RABBIT(1,855) 

GUINEA (1,855) 

RAT(1,855) 

MOUSE(1,855) 



510v 



530v 



540v 



550v 



560v 



570v 



580v 



590v 



600v 



TGTTGC^TA~GC^TATACAAATATCTTCCT^ 

*m APrtw 630v 640v 650 v 660 v 670v 680v 690v 700v 



770v 



780v 



790v 



800v 



610v 820v 830v 840v 850v 860v 870v 880v 890v 900v 
TGCAATGAAAGGCAAATATGGAATATATACCAAGGTATCCCGGTATGTCAACTGG 
TGCAATGAAAGGAAAATATGGCATATATACCAAGGTGTCCCGGTAT 
TGCAGTGAAGGGAAAATATGGAATATATACCAAGGTATCCCGGTATGTCAACTGG 
TGCAATTAAAGGGAAATATGGAGTATATACCAGGGTATCCTGGTATGTCAACTGG 
CGCAATGAAAGGCAAATATGGAATATATACCAAAGTATCTCGGTATGTCAACTGG 
TGCAATGAAAGGCAAATATGGAATATATACTAAGGTTTCCAGGTATGTCAACTGG 
TGCAATGAAAGGCAAATATGGAATATATACTAAGGTTTCCCGGTACGTCAACTGG 



TGCAATGAAAGGV AAATATGGA A TATATAC YAAGGTOTCCHGGTATGTCAACTGG 



FIG. 3 — Continued 



IX gene that begins at the base that corresponds to the 
human sequence 30949. The oligonucleotide is 19 nu- 
cleotides and the sequence heads upstream relative to 
in vivo transcription. 

Oligonucleotides 

The oligonucleotides used to sequence both strands 
(10.4 kb total) are listed with the abbreviation followed 
by the informative name: 

1. OliA: F9(Hs)-(T7/TI-37)E6(20365)-5lD-5' 
GGATCCTAATACGACTCACTATAGGGAGACC 
ACCATGCCATTTCCATGTGG3' 

2. OliB: F9(Hs)E8(31366)-20U — 5'AGTGAGCT 
TTGTTTTTTCCT3 ' 

3. OliC: F9(Hs)-(Sp6-29)E8(31346)-47U-5'G 
GTACCATTTAGGTGACACTATAGAATACTAA 
TCC AGTTG AC ATACC3 ' 

4. OliD: F9(M)E6(20440)-20D — 5'GACTATGAA 
AATTCTACTGA3' 



5. OliE: F9(Hs)E6(20537)-14U— 5TCTTCTCC 
ACCAAC3' 

6 OliF: F9(Sh)-(T7-29)E7(30060)-45D— 5'GG 
ATCCTAATACGACTCACTATAGGGAGATGCT 
GC ATTCTGTGG A3 ' 

7. OliG: F9(Hs)E7(30106)-20U — 5'GTTACAAT 
CC ATTTTTC ATT3 ' 

8. OliH: F9(M)E8(30849)-16U— 5'ATCCAGTT 
CCAGCAAG3' 

9. Olil-l: F9(M)E8(30851)-18U — 5'ACAGAACA 
AAGG AG AAAT3 ' 

10. OliI-2: F9(Hs)E8(30851)-18U — 5'ACAGAGC 
A A AAGCG AAAT3 ' 

11. OliM: F9(Hs)E8(31013)-17D— 5'ATCTTCC 
TC A A ATTTGG3 ' 

12. OliN: F9(Hs)E8(31101)-17U-5'CTAAGGT 
ACTG A AG A AC3 ' 

13. OliO: F9(Hs)E8(31157)-17D— 5'AACAACA 
TGTTCTGTGC3 ' 
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Human 

Sh**p 

Pig 

Rabbit 

Guln«a Pig 

Rat 

Mouta 

Cow 



Human 

Sh*«p 

Pig 

Rabbit 

Guinea Pig 

Rat 

Koua« 

Cow 



Human 

Sh««p 

Pig 

Rabbit 

Guinea Pig 

Rat 

Mous* 

Cow 
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110v 

ETILDNITQST 

.I.W. .V. . -K 

.P...SL.I.N 

. . .RG.V. .RS 

N I SW4G. . -F.DDETIWDDNDDD. . -S.E. . 

. T . . 8NT. -G L — ILDDITN-S L.ENS 

. !l, .SNM. -B VFIQDDITD-GA. .N.V.E.3 

. . TI.SNTN.B. . S . . .I.W. • V. . .N 



140v * 150v 160v 
RVSVSgtSK-LTR AKAVFPDVDYVNSTEA 
.A.OLH. , .K TI . SNMN.E . . 8 . . 

. . HSPTT II.SNM, .E. . . - V 

HA..KI.. .TTI.SNTE.E.P. 

. . .IP3V. . EHN . 

ATM. .XI. - 

.A.I.Y3. .KZ. . 
HZ. .X. . - 



220v 



lBOv* I90v 200V 210v 

QSFNDFTR VVWBDAXPGQFPWflVVLNCKVDArCGGSIVNEXWIVTAAHC 

.AR L.H.EIA V 

L....I I.-..V 

.L E I V 



.N. 
.3D. .1- 



I. . . 



KPSDB.F. 

EPI 

E.L 

. . .OS.S. 



.BR. 



. .L. . .ETX. 
, .1. . . EIE . 
. .1. . .EIE. 
. .L.H.EIA. 



.A.I. 
.A.I. 



270v 



230v 240v 2S0v 260v 

VBT^ITVVACEHNIEETEHTBQICRKVIRIIPHHNYNAAINXYKHDIAL 

IKP T.KP.P A..Y.C. 

. Y. T P. . .R X S. 



I. P. 



.S. . 
.TV. 



IKPDOH Y. .Q. . -N. . 

ILP.I. .E X KX.D. . 

LKP.D. .X DBK.D. . 

LKP .D..E Y. . DKX . D . . 

IKP T.KP.P. . 



R. 



. tq. . l. .s. . .sr. . 

...T Q...T... 

...T....Q...T... 
. . .A. .Y.S. . ,S. . . 



32 Ov 



Human 
Sb**p 

Pig 
Rabbit 



280v 290v 300v 310v 

I^U)EPLVIJJ3YVTPlCIADRBYTOiriJCrGSGYVS<aiCRVrHK(»SALV 

. ...R Y NX SI 

- NR TI 

R " t to * N HR..Q.SI 

ouin.. Pi, : : : :.:::::::: : : '• : *• • • : • : 

S::.. ::::e::i::: ::::::▼:»:::: ::::::. "••^••<> : £ 

Cow * ro S...T 

330v 340v 350v 360v 370v 

I^TIJIVPLVDRATCLRSTIOTTIYNW^CAGT 



Human 
Sh««p 
Pig 
Rabbit 
Guinea Pig 
Rat 
Mouaa 
Cow 



,DV. 



38 Ov 390v 400v 

VBCTSFLTGIISWGEECAMXGKYGIYTXVSRYVNW 



Sheep 

Pig 

Rabbit 

Guinea Pig 

Rat 

Mouae 

Cow 



FIG. 4. Alignment of the amino acid sequences with human 
sequences. The activation peptide is delimited by asterisks. The bo- 
vine sequence has previously been published (13). The number system 
is that of Yoshitake et al (31). 



14. OliP: F9(Hs)E8(31215)-l6U-5'CTATCTC 
CTTGACATG3' 

15. OliQ: F9(M)E8(31227)-18U~5'CTTCTAC 
TTC AGT A AC AT3 ' 

16. OUR: F9(M)E8(31320)-20U-5'TTAGTATA 
T ATTCC AT ATTT3 ' 

RESULTS AND DISCUSSION 

Sequencing across Species 

RNA amplification with transcript sequencing 
(RAWTS) (Sarkar and Sommer, 1988, 1989) and 



genomic amplification with transcript sequencing 
(GAWTS) (Stoflet et al, 1988) are methods of direct 
sequencing that utilize a phage promoter sequence 5' 
to at least one of the PCR primers. RAWTS is a four- 
step procedure that includes (1) cDNA synthesis with 
oligo(dT), random primers, or an mRNA-specific oli- 
gonucleotide primer; (2) PCR where one or both oli- 
gonucleotides contain a phage promoter attached to a 
sequence complementary to the region to be amplified; 
(3) transcription with a phage polymerase; and (4) di- 
deoxy sequencing with reverse transcriptase. The pro- 
cedure for GAWTS is identical except that genomic 
DNA is the input to step 2. RAWTS/GAWTS has a 
number of advantages: (1) the transcription step pro- 
duces an additional level of amplification which ob- 
viates the need for purification subsequent to PCR, (2) 
the amplification afforded by transcription can com- 
pensate for a suboptimal PCR, and (3) the generation 
of a single-stranded template provides more reproduc- 
ible sequence than that obtained from a double- 
stranded template. Disadvantages of the technique are 
the limited number of different sequencing enzymes 
available and the added expense of attaching phage 
promoters to the PCR primers. As with all direct se- 
quencing methods, RAWTS is insensitive to the error 
rate of Taq polymerase because the sequence generated 
is the average for a population of molecules. 

RNA amplification with transcript sequencing was 
performed on mRNA from the livers of pig (Sus scofa), 
sheep (Ovis aries), rabbit (Oryctolagus cuniculus), 
guinea pig (Cauia porcellus), mouse (Mus musculus), 
and rat (Rattus norwegicus). mRNA was chosen as the 
input nucleic acid rather than genomic DNA because 
sequence from more than one exon was desired and 
oligonucleotides complementary to coding sequence 
were deemed more likely to produce successful ampli- 
fication across species. 

Previously, some initial sequence was obtained 
across species after a segment, which included sections 
of exons G and H, was amplified with available human 
primers (Sarkar and Sommer, 1989). To obtain the se- 
quence from a much larger segment that included the 
activation peptide and the catalytic domain, the pub- 
lished human and bovine amino acid sequences were 
compared and PCR primers were made to conserved 
amino acids at the beginning of exon F (OliA) and the 
end of the coding region of exon H (OliB). Since precise 
matches at the 3' end are critical (Sommer et al, in 
press), the primers were designed to avoid possible co- 
don redundancy in the 3' two bases of the oligonucle- 
otide. When the magnesium concentration was varied 
from 2 to 8 mAf, segments of the predicted size were 
seen from each of the species (Fig. 1A). The segment 
was eluted from an agarose gel, transcribed, and se- 
quenced by reverse transcriptase with OliB (Fig. IB). 
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The sequence indicated that the factor IX homologs 
had been obtained. Since the sequence adjacent to the 
OliB was well conserved, PCR was performed from 
cDNA with OliA and a primer with an SP6 promoter 
sequence OliC. As predicted, factor IX segments (T7- 
SP6 segments) of about 900 bp (including the phage 
promoter sequences) from all six species were amplified. 
Sequence was obtained from one end by transcribing 
with T7 RNA polymerase and from the other end by 
transcribing with SP6 polymerase. In some cases, the 
T7-SP6 segment of a given species could be directly 
transcribed and sequenced with an internal oligonu- 
cleotide that was previously synthesized to delineate 
mutations in individuals with hemophilia B (Koeberl 
et al, 1989). Unfortunately, mismatches with the cross - 
species sequence often precluded efficient priming in 
the sequencing reaction. PCR was found to be more 
forgiving of mismatches. Thus, in most cases, sequence 
was obtained by performing a second round of PCR 
with an internal oligonucleotide purifying the segment 
of appropriate size, and sequencing with the internal 
PCR primer. More specifically, PCR was performed 
with a given internal oligonucleotide and either the 
SP6- or the T7-containing oligonucleotide on typically 
a 3000-fold to 40,000-fold dilution of the 900-bp T7- 
SP6 segment. The amplified segment of appropriate 
size was purified from both primer dimers and spurious 
amplification products by elution from an agarose gel. 
Either the freeze-squeeze or the glass bead technique 
was used, depending on the size of the amplified seg- 
ment (see Materials and Methods). The segment was 
then transcribed with the appropriate phage polymer- 
ase and sequenced with the PCR primer. 



TABLE 3 

Factor IX Sequences That Are Conserved 
in Seven Species 0 





Activation 


Catalytic 




peptide 6 


domain c 




(%) 


(%) 


Completely conserved amino acids 


23 


69 


Amino acids with highly 






conservative changes* 


17 


8 


Completely conserved nucleotides 


42 


67 


Completely conserved third base 






of codon 


37 


42 



0 The previously published bovine sequence is excluded because 
only amino acid sequence was determined. 
* Excluding the insertion in the rodents. 
c Excluding the last eight amino acids. 

'The allowed highly conservative substitutions are I/V/L, F/Y, 
E/D, E/Q, D/N, K/R, T/S, or S/A. 



Since the input for the second round of PCR is the 
T7-SP6 segment, the majority of the DNA molecules 
contain the segment of interest. If the match between 
the internal oligonucleotide and the species sequence 
is good, a 40,000-fold dilution usually provides a strong 
amplification signal with a minimum of spurious am- 
plification products. Even three or four mismatches 
can be tolerated if they are not near to the 3' end where 
elongation is initiated (Table 1). For mismatches close 
to the 3' end, it was necessary to decrease the dilution 
of the input DNA. With only a 100-fold dilution of the 
T7-SP6 segment, it was sometimes possible to obtain 
amplification even if the mismatch occurred at the 3' 



TABLE 2 

Pairwise Differences in the Sequences for the Activation Peptide and the Catalytic Domain 0 



Relative to mouse (%) 



Relative to humans (%) 



Amino 



Nucleic 
acid 



Third base 
of codon 



Amino 
acid 



Nucleic 
acid 



Third base 
of codon 



Mouse 
Rat 


29/2 


15/3 


17/6 


37/20 
40/20 


19/15 
25/13 


17/25 
23/21 


Guinea pig 
Rabbit 

Pig 

Sheep 

Human 


60/14 
41/15 
37/15 
37/18 
37/20 


33/16 
26/15 
24/19 
22/19 
19/15 


31/30 
29/28 
26/34 
23/32 
14/25 


15/18 
46/20 
43/18 
40/16 


22/13 
21/12 
21/14 
17/13 


14/22 
14/20 
20/28 
14/19 


Average 6 


42/16 


25/17 


25/30 


43/19 


21/13 


17/23 



• The comparisons for the activation peptide are based on only the 35 amino acids that all these peptides have in common. For each category, 
the percentage of different amino acids or nucleotides for the activation peptide is presented to the left of the shill followed by the corresponding 
value for the catalytic domain. The bovine sequence (13) was excluded from these comparisons because only amino acid sequence is available. 

6 The average for mouse excludes the closely related rat sequence. 
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Factor Vita or Xla 



<^J<F] p D V D<^| V<^HCHf) 



n Mcnp 




■ Rcxtont insertion 



<0 

L ACTIVATION 



PEPTIDE 



T 0 



FIG. 5. Types of ammo acid conservation in factor IX. The bovine sequence (13) is included * ^ *" ^ ^t^En)' 

THe sequence of nu.an factor IX ,s C ^ c.u^^ 



end of the oligonucleotide (Sarkar et aL, manuscript 
in preparation). 

Sequence Comparisons 

By sequencing of transcripts generated by T7 and 
SP6 RNA polymerase, respectively, data were obtained 
from both strands (Fig. 2). The nucleic acid and the 
deduced amino acid sequences were aligned with the 
human sequence (Figs. 3 and 4). Pairwise comparisons 
of the sequences with mouse factor IX sequence indi- 
cate that rat factor IX is more closely related than the 
others (Table 2), whereas guinea pig factor IX sequence 



has diverged as much as the rabbit, ungulate, and pri- 
mate sequences. This is compatible with a postulated 
increased rate of evolution in rodents (Li et al., 1987; 
Wu and Li, 1985). An alternate, perhaps less likely 
possibility, is that there are major misinterpretations 
of the rodent fossil record (Wilson et ai, 1987). 

Pairwise comparisons with human factor IX se- 
quence show the same average amino acid divergence 
as that seen when mouse was the comparison standard 
(Table 2). Curiously, there is less deviation at the third 
base of the codon when humans are the comparison 
standard, but the average divergence of the first and 
second bases is similar in comparisons with human or 
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TABLE 4 

Sequence of Potential N-Glycosylation Sites 



Position of asparagine" 



Species 


157 


In 6 


167 


172 


228 


249 


260 


Human 


NST 




NIT 










Rabbit 


NFT 




NVT 




NIT 




NAT 


Guinea pig 


NFT 




NST 






NVT 


NAS 


Rat 


NST 


NST 


NLT 








NAT 


Mouse 


NST 




NVT 








NAT 


Pig 


NST 






NQS 






NAT 


Sheep 


NSS 




NVT 


NQS 






NAS 


Cow 


NSS 




NVT 


NQS 






NAS 



a The numbering system is that of human factor IX (31). 
6 This asparagine is located in the rodent -specific insertion. 



mouse. The difference in the third base of the codon 
may reflect differences in the codon biases found in 
mouse (data not shown). 

Patterns of Conservation 

For the activation peptide, 42% of the bases are in- 
variant in all seven species listed in Table 2 while the 
catalytic domain is more conserved, as 67% of the bases 
are invariant (P < 0.001 by the binomial distribution) 
(Table 3). The deduced amino acid sequence indicates 
that there is a deletion in humans of a lysine located 
three amino acids before the beginning of the activation 
peptide (Fig. 5). In addition, an insertion is seen in 
rodents. The size and the sequence of the insertion are 
not well conserved. In the guinea pig, most of the in- 
sertion is part of a seven-amino-acid repeat which is 
presumably of recent origin because the corresponding 
cDNA sequence is a precise 21 -base repeat (Fig. 3). 
These were the only deletions and insertions found. 
The presence of a nine-amino-acid insertion and the 
marked conservation of the carboxy terminus relative 
to humans agree with an abstract reporting the cloning 
of mouse factor IX cDNA (Wu et al, 1988). 

Twenty-three percent of the activation peptide 
amino acids and 69% of the catalytic domain amino 
acids were identical in the seven species (Table 3). In 
the activation peptide, another 17% of the amino acids 
exhibited only highly conservative substitutions while 
only 8% of those in the catalytic domain exhibited such 
highly conservation changes, i.e., a ratio of identity to 
high conservative changes of 8.6:1. The remaining 60 
and 23% of the amino acids in the activation peptide 
and catalytic domain were either less conserved or un- 
conserved. While the catalytic domain is more con- 
served, the magnitude of conservation in the activation 
peptide and the absence of clustering near the cleavage 
sites suggest that it may function as more than a spacer 



between the heavy and light chain. In in vitro activation 
of bovine factor IX with factor XIa, it was observed 
that the cleaved activation peptide was noncovalently 
associated with factor IXa (Amphlett et al, 1979). If 
noncovalent association after cleavage were of physi- 
ological importance, such an interaction might possibly 
account for some of the conservation that is seen. 

The two known N-glycosylation sites in the human 
activation peptide (Balland et al, 1988) are conserved 
in all species with the exception of pig where the second 
site is lost or perhaps displaced six amino acids down- 
stream (Table 4). In humans, a conserved potential 
glycosylation site at asparagine 260 (catalytic domain) 
is lost. This site is known to be glycosylated in cow 
(Mizuochi et ai, 1983), strongly suggesting that a com- 
mon glycosylation site found in mammals has been lost 
in human factor IX. While all the sequenced factor IX 
proteins are likely to be glycosylated, the precise lo- 
cation of the glycosylation seems not to be crucial, save 
perhaps the site at asparagine 157. The N-glycosylation 
sites constrain the evolution of the activation peptide. 
The conservation of glycosylation may reflect its im- 
portance in maintaining an appropriate turnover rate 
for factor IX in the circulation. 

In each species, there are a few divergences from 
otherwise completely conserved amino acids. In hu- 
mans, proline is present instead of serine at position 
151. Likewise, threonine is present in place of proline 
225 and valine is present in place of isoleucine 322. 
Such substitutions may be tolerated in humans because 
an apparently compensatory change occurred at an- 
other position in factor IX (perhaps the serine to pro- 
line and the proline to threonine substitutions consti- 
tute a pair of compensating mutations) or, conceivably, 
in one of the other proteins that interacts with factor 
IX. This hypothesis predicts, for example, that an in 
vitro mutation of proline 151 to serine would inactivate 
human factor IX. The simultaneous mutation of pro- 
line 151 and a compensatory change would be required 
for human factor IX to retain its activity. 

Types of Amino Acid Conservation 

An alignment was performed for the amino acid se- 
quences of human factors VII, IX, X, protein C, and 
trypsin (Hagen et ai, 1986; Fung et ai, 1985; Beckman 
et ai, 1985; Emi et ai, 1986) and bovine factors IX, X, 
protein C, and trypsin (Katayama et al, 1979; Fung et 
al, 1985; Long et al, 1984; Mikes et al, 1966, as mod- 
ified by Kossiakoff et al, 1977). Thus a total of seven 
coagulation proteases and two trypsins were aligned. 
Conserved amino acids were identified in the catalytic 
domain, but not in the activation peptide (data not 
shown). Different levels of amino acid conservation 
were seen in the catalytic domain (Fig. 5). Those that 
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were identical or highly conserved can be viewed as 
generic. Twenty percent of the amino acids in the cat- 
alytic domain are identical in the four coagulation pro- 
teases (Fig. 5, circles). A comparison with Fig. 5 indi- 
cates that all these amino acids were identical in the 
eight species of factor IX. Sixteen percent of the amino 
acids vary in a highly conservative fashion in the co- 
agulation proteases (Fig. 5, pentagons). Of these, some 
show the same variation in factor IX [e.g., amino acids 
181 (V or I) and 400 (K or R)] (pentagons with asterisks 
inside) while the others were identical in all species of 
factor IX [e.g., amino acid 210 (I but not L) and amino 
acid 218 (T but not S)] (pentagons without asterisks). 
There were two exceptions to the conservation. Both 
glutamate and aspartate appear in the five coagulation 
proteases at the position that aligns with the factor IX 
amino acids 235 and 292. However, guinea pig has a 
lysine in position 235 and four species have an aspar- 
agine in position 292. 

Seven percent of the amino acids are identical in the 
four coagulation proteases except for a nonconservative 
change in one of the following: factor VII, factor X, 
and protein C (squares). All of these amino acids were 
identical in the eight factor IX sequences. In total, 43% 
of the amino acids of the catalytic domain are generic 
for the coagulation proteases. In addition, 34% of the 
conserved amino acids are specific for factor IX. Ninety 
percent of these factor IX-specific sites are identical 
(triangle) in all eight species, while 10% undergo highly 
conservative substitution (triangles with asterisks). In 
the activation peptide, 3% of the amino acids are ge- 
neric and 37% are specific for factor IX. Thus, both 
domains have about the same frequency of factor IX- 
specific amino acids, while the frequency of coagulation 
protease generic amino acids varies markedly. 

From the factor IX-specific sequences, clusters of 
conserved amino acids were arbitrarily defined as re- 
quiring at least four consecutive conserved amino acids 
for nucleation and at least two consecutive noncon- 
served or protease generic amino acids for termination 
(Fig. 5, segments 1-3). In vitro mutagenesis or analysis 
of CRM+ hemophiliac factor IX proteins with muta- 
tions at these clusters will allow an assessment of the 
importance of these clusters in the specific interactions 
of factor IX with factors VII, VIII, X, XI, and anti- 
thrombin III. 

In conclusion, the remarkable tolerance demon- 
strated herein for mismatches implies that sequence 
can be generated for a battery of species with a mini- 
mum of oligonucleotides. mRNA was the source of the 
sequence in these experiments, but cDNA or genomic 
libraries or genomic DNA could also be used. The en- 
hanced rate of cross-species sequencing afforded by 
ZooRAWTS will be applicable generally to studies 
concerned with structure-function and/or evolution. 
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Within the past 20 years or so, factor IX has been at the centre of particularly intensive studies of its physiology, pathology and biochem- 
istry as well as its molecular genetics and biology. With the complete nucleotide sequence of its human gene determined in 1985 and the 
molecular defects of over 600 abnormal human factor IX genes analysed to date, factor IX is among the few mammalian proteins which 
have been exhaustively studied in almost every aspect. The enormous amount of information we now have on this medium-sized plasma 
protein sheds light on how a gene and its protein evolve, how the protein carries out a highly regulated, specific and pivotal role in the 
delicately balanced blood coagulation reaction, and the correlation between clinical presentations and its highly diverse molecular mech- 
anism of defects. This wealth of knowledge makes factor IX an excellent model for deeper study, such as truly quantitative analysis of its 
structure-function relationship and in vivo function and regulation. It will also provide a sound foundation which may lead to improved 
treatment of haemophilia B and perhaps to its cure. This paper attempts to review the recent progress in research on factor IX. 

Key words: Haemophilia, factor IX, molecular genetics, review. 



Introduction 

Blood coagulation is the principal mechanism which 
follows the initial platelet plug formation to stop blood 
loss after vascular injury. 1 The basic mechanism of 
blood coagulation and its regulation involves more 
than 20 protein factors in addition to calcium ion and 
phospholipids. 1,2 In this mechanism, factor IX plays a 
crucial role occupying a key juncture of the intrinsic 
pathway involving factor XI and the extrinsic pathway 
involving factor VII and tissue factor (Figure 1). After 
activation by these pathways, factor IX in turn acti- 
vates factor X in the presence of factor VIII, Ca 2 " and 
phospholipid surface. A deficiency of factor IX in the 
circulation results in a bleeding disorder, haemo- 
philia B. 

Recently, factor XI was shown to be activated by 
thrombin, resulting in significant revisions in the 
coagulation cascade (Figure 1). M These revisions have 
indicated the important roles of thrombin and factor 
XI in the initiation and maintenance of blood 
coagulation, and have provided an explanation for a 
lack of bleeding disorders due to deficiencies of factor 
XII, prekallikrein, and high molecular weight kinino- 
gen. Because both haemophilia A (deficiency of factor 



VIII) and haemophilia B patients bleed in spite of the 
normal amount of factor VII in their circulation and 
sufficient amount of tissue factor available, generation 
of factor Xa catalysed by the pathway involving factor 
IXa-factor VIII complex is obviously crucial for the 
stable maintenance of coagulation. Activation of factor 
IX and/or factor X by the factor Vila-tissue factor 
complex upon vascular injury may be essential for the 
initiation of coagulation by generating the initial 
minute amount of thrombin which may, in turn, acti- 
vate factor XI to factor XIa leading to subsequent pro- 
duction of factor IXa. 2 The activation pathway of 
factor X by factor Vila-tissue factor, however, appears 
to be tightly controlled by lipoprotein-associated 
coagulation inhibitor (also called extrinsic pathway 
inhibitor), suggesting its transient role, if any, in main- 
tenance of coagulation. 2 - 56 The activation pathway of 
factor IX by the factor Vila-tissue factor complex, 
however, may still play an important role in mainten- 
ance of coagulation in vivo J 

This article reviews current knowledge of the 
biology of factor IX: its structure-function relation- 
ships, gene structure and abnormal genes, regulation of 
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Figure 1. Basic mechanism of blood coagulation. PL indicates phos- 
pholipids. Activated forms of coagulation factors are shown with a 
suffix 'a'. 



the gene and the current status of developing new ther- 
apy for haemophilia B. 

Structure of factor IX 

Human factor IX is synthesized as a prepro form of a 
single polypeptide chain (Figure 2). 89 Prepro factor IX 
is composed of several distinct domain (or module) 
structures. These include preleader (also called signal 
peptide) which spans amino acid (aa) — 46 through 
— 19, proleader (also called propeptide) spanning aa 
-18 through — 1, Gla domain (the amino-terminal 
region of about 40 amino acid residues starting at aa 
+ 1) containing twelve y-carboxylated glutamic acid 
(Gla) residues, a short hydrophobic sequence, two epi- 
dermal growth factor-like domains (each about 40 
amino acid residues in length), a linking sequence, acti- 
vation peptide (35 amino acid residues in length) and 
catalytic subunit of 235 amino acid residues. Intron 
positions relative to the amino acid sequences divide 
these domain structures in a characteristic manner. 
During secretion, both signal peptide and propeptide 
are cleaved off and the mature factor IX (plasma form 
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of 415 amino acid residues in length) is produced. 
Although factor IX cDN A has three Met codons in the 
same reading frame clustered in the amino-terminal 
end region at aa —46, —41 and —39, the third Met resi- 
due at nucleotide (nt) -39 has a sequence 
(ATCATGG) which matches best with the Kozak 
consensus sequence (NNPuNNATGGNN), 10 sug- 
gesting that this Met could be the primary translation 
start site. This is further supported by the sequences of 
different species which have conserved Met at aa —39 
but are missing a Met residue at —41 (dog, rat and 
mouse) and at —46 (dog and rat). 11 According to the 
ATG scanning model, 12 the first ATG at —46 may still 
be used for translation initiation albeit at low level. Pre- 
pro factor IX undergoes several co- and post-trans- 
lational modifications. Its signal peptide is cleaved off 
by signal peptidase during the secretion of the nascent 
polypeptide chain, and the propeptide sequence is 
eventually cleaved off by a processing protease during 
the secretion of factor IX protein. Proteases which may 
be responsible for removing propeptides with a dibasic 
amino acid sequence at its carboxyl-terminus have 
been isolated and their cDNAs cloned. 13 These pro- 
teases are either metalloendopeptidase 13 or subtilisin 
type proteases (PACE). 14 In co-expression experi- 
ments with factor IX in Chinese hamster ovary (CHO) 
cells, PACE could enhance propeptide cleavage which 
takes place late in the secretory pathway. 15 Whether 
both or only one of these different endopeptidases are 
responsible for processing in the liver is not known. 

The mature plasma factor IX is a single polypeptide 
chain starting with Tyr at aa + 1 and ending with Thr at 
aa +415. Interestingly, all other homologous vitamin 
K dependent proteins of the coagulation system have 
Ala at this position (Figure 2). 9 Study by mutagenesis 
has shown that replacement of the Tyr residue at this 
position with Ala significantly improves the cleavage 
of the propeptide with little effect on y-carboxylation 
of recombinant factor IX. 16 Why factor IX maintains a 
Tyr residue at this position is not known. It may have 
some biological significance in the overall regulation of 
coagulation. The plasma factor IX is secreted from 
hepatocytes into the bloodstream. During blood 
coagulation, the plasma factor IX undergoes limited 
proteolyses which free a 35 amino acid residue-long 
activation peptide, converting itself to the activated 
form, factor IXa. This is catalysed by either factor XIa 
in the presence of Ca 2+ ions or factor Vila in the pres- 
ence of tissue factor and Ca 2+ ions. Factor IXa is com- 
posed of a light chain (the amino-terminal half, aa 
1-145) containing five structural domains with various 
functions to regulate the factor IX and a heavy chain 
(the carboxyl-terminal half, aa 1 80—415) containing the 
protease domain (catalytic subunit). 
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Figure 2. Ammo acid sequence and tentative domain structure of human prepro factor IX. Numbering of amino acid sequence positive 
numbers for the mature form of factor IX starting with Tyr at + 1 ; negative numbers in reverse direction for preproleader sequence starting with 
Arg at - 1. Intron positions in relation to the amino acid sequence are shown with residue numbers in parentheses. Arrows show the locations of 
peptide bonds which are cleaved during processing and activation of the prepro factor IX. Circled residues in the catalytic subunit indicate key 
amino acid residues involved in the active site. Modified from Yoshitake et al. 9 



The propeptide sequence of factor IX (aa - 1 8 to - 1 ) 
plays an important role in the vitamin K dependent 
y-carboxylation of twelve glutamic acid residues con- 
tained in the Gla domain. 1718 The importance of pro- 
peptide is also shown for other similar vitamin K 
dependent factors such as protein C. 19 Gla residues in 
the Gla domain play an essential role in the biological 
function of factor IX as Ca 2+ binding sites. Studies on 
abnormal factor IX genes and a series of mutagenesis 
analyses carried out on the Gla domain and propeptide 
sequence have shed light on the mechanism responsible 
for the modification. 17J8 The propeptide alone without 



the Gla domain can serve as the recognition site for vit- 
amin K dependent carboxylase which is embedded in 
the rough endoplasmic reticulum membrane. This is 
supported by the finding that synthetic propeptides 
alone can augment the carboxylase activity. 17 * 18 The 
maintenance of the approximate size of the intact pro- 
peptide is apparently important for its function, and 
some residues of the propeptide including those at aa 
-18, -17, -16 (Phe; conserved among vitamin K 
dependent proteins), -15, and -10 (Ala; conserved) 
are critical for the reaction. In these studies, Arg resi- 
dues at aa -4 and - 1 have been shown not to be critical 
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for the y-carboxylation reaction. Interestingly, how- 
ever, mutant factor IX molecules, factor IX Cambrid 
(Arg-1 changed to Ser) 20 and factor IX Sajl rw (Arg-4 
changed to Glu) 21 which are transported into the circu- 
lation with propeptides uncleaved, show decreased 
levels of y-carboxylation. Mutations at aa —1 and ~4 
not only inhibit the proper cleavage of the peptide 
bond between Arg-1 and Tyr+1, but also may affect 
y-carboxylation to some extent by changing the con- 
formation of the propeptide and Gla domains. In 
y-carboxylation, propeptide bound to carboxylase 
may function to anchor the nascent factor IX polypep- 
tic chain so that the active site of carboxylase specifi- 
cally recognizes the unmodified Gla domain region 
and scans the domain modifying the twelve Glu resi- 
dues in the region to Gla residues. 1718 

Recently, a cDNA clone for membrane-integrated, 
vitamin K dependent y-glutamyl carboxylase was iso- 
lated. 22 When expressed in COS-1 cells and CHO cells, 
this carboxylase can augment the in vitro y-carboxy- 
lase activity of microsomal preparations by 1 7- and 1 6- 
fold, respectively, 23 agreeing reasonably well with the 
previous observation. 22 In contrast, transient co-trans- 
fection of the y-carboxylase expression vector into fac- 
tor IX-expressing CHO cells did not improve the 
specific procoagulant activity of secreted factor IX, 
suggesting that the y-carboxylation of factor IX is not 
limited by the expression of the vitamin K dependent 
y-carboxylation alone. 23 

A very high expression of factor IX in hepatoma cells 
(> 100 Atg/ml medium) in culture results in a poor spe- 
cific activity (only 1.5%) of factor IX, apparently due 
to poor y-carboxylation. 24 When a relatively high level 
of recombinant factor IX (1-3 /xg/10 6 cells/day) is pro- 
duced by various heterologous cell lines such as BHK 
cells and CHO cells, its specific activity is also low, 
varying in a range of 25~70%. 25 Furthermore, a signifi- 
cant fraction (20%) of factor IX secreted from CHO 
cells escapes a proper cleavage of propeptide, resulting 
in inactive factor IX molecules with the propeptide still 
attached. 26 These data indicate that cultured cells such 
as BHK cells and CHO cells have mechanisms 
required for various co- and post-translational modifi- 
cations with rather low, limited capacities. 

Studies on the vitamin K dependent proteins have 
provided evidence of a specific biological role of pro- 
peptide in protein biosynthesis. The propeptide of von 
Willebrand factor has been shown to be required for 
multimerization of this protein, providing another 
function for such sequences. 27 29 Interestingly, the pro- 
peptide (except the first Thr residue) and Gla domain 
of factor IX are coded by a second exon, suggesting 
that these two adjacent unique domains are evolution- 
ally one unit (Figure 2). 9 Several mutant factor IX genes 



containing mutations in the Gla domain, 30 - 31 such as fac- 
tor IX chongqing 32 which has its Glu27 replaced with Val, 
provide invaluable information on the structures 
required for the function of the Gla domain. The Gla 
domain binds calcium ions with a moderately low 
binding affinity (average K d — 0.8 ^tM). 33,34 Binding of 
calcium ions to the Gla domain is required for its con- 
formational rearrangement from a disordered form to 
an ordered and organized form involving the epider- 
mal growth factor (EGF)-like domain. This confor- 
mational rearrangement is essential for factor IX to 
bind to negatively charged phospholipid vesicles pro- 
vided in vivo by activated platelets resulting in its local- 
ization and augmentation of activation. Recently, the 
X-ray crystallographic structure of the Gla domain of 
prothrombin fragment 1 was determined. 35 36 This 
structure shows that in the absence of calcium ions, 
most of the Gla domain (aa 1-35) is substantially dis- 
ordered. However, when the fragment 1 was crystal- 
lized in the presence of Ca 2 \ the structure of the Gla 
domain was found to be well ordered, giving enough 
intensity of diffracted X-ray for a detailed analysis. 
This agrees well with the above observations obtained 
from experiments in solution. The Gla domain is com- 
posed of four separate short a-helices. The Gla domain 
of prothrombin fragment 1 binds seven Ca 2+ ions con- 
taining four trapped between two parallel structures 
formed of two segments including residues 7 and 8, and 
residues 20, 21, 27 and 30. All Gla residues found in 
prothrombin are also conserved in factor IX, suggest- 
ing the similar Ca 2+ binding may be expected for factor 
IX. A mutation (Gla27 replaced with Val) found in fac- 
tor IX^^g, 32 therefore, apparently disturbs an 
important Ca 2+ binding site in the Gla domain. Fur- 
thermore, mutagenesis analyses suggest that both 
Gla20 and Gla21 are required for maintenance of the 
structure recognized by factor XIa in activation, and 
that Gla21, but not Gla20, is also necessary for the cal- 
cium-dependent conformational change and endo- 
thelial binding of factor IX. 37 Factor IX, however, 
contains two more Gla residues (aa 36 and 40) which 
are not shared in prothrombin. Whether or not these 
are also involved in extra Ca 2+ ion binding is not 
known. Binding of human factor IX to endothelial cells 
requires a small region of the Gla domain spanning 
residues 3-1 1. 38 

Two EGF domains in factor IX do not have any 
growth factor-like activity 39 and may have confor- 
mations of antiparallel pleated sheets as shown for 
factor X. 40 Only the first epidermal growth factor- 
like domain contains a high-affinity calcium bind- 
ing site. 33 41 Binding calcium ion to this domain is 
essential to initiate conformational rearrangements 
involving the Gla domain. 33,34 The first EGF domain 
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(NH 2 -terminal domain, corresponding to aa 47 
through 84) undergoes at least three types of post- 
translational modifications. These include erythro-/3- 
hydroxylation of Asp64, 42 O-glycosidically linked di- 
or trisaccharide (D-Xyl/>al-3-D-Glcbl-0-Ser53, or 
one more D-Xyl extension) in human or bovine factor 
IX, respectively/ 3 and three disulphide bond forma- 
tions. /3-Hydroxylation of Asp64 forming /3-hydrox- 
yaspartate (Hya), which is catalysed by a 2-oxo- 
glutarate-dependent dioxygenase in liver microsomes, 
is only partial in factor IX (~-30% complete). This is 
markedly different from other proteins such as factor 
X which undergoes a complete modification at this site. 
Dioxygenase does not require vitamin K for its 
activity, and /3-hydroxylation is a reaction indepen- 
dent of y-carboxylation. 44 By using inhibitors that 
block aspartyl /3-hydroxylation of recombinant 
human factor IX, the Hya residue in factor IX was 
demonstrated to be non-essential for factor IX func- 
tion as well as for binding to endothelial cells. 

By a series of intrinsic protein fluorescent studies 
with various portions of factor IX, the first EGF 
domain was further studied for its high-affinity cal- 
cium binding site(s) (half saturation at —40 /xM 
Ca 2+ ). 33 ' 34 This site is present independent of the state of 
carboxylation of the Gla domain. When Asp64 is not 
/3-hydroxylated, the EGF domain still maintains a 
high-affinity Ca 2+ binding site, although with 
K d = 200-300 fiM. Calcium ion binding at the high 
affinity site in the first EGF domain appears to induce 
significant conformational changes in factor IX that are 
detected by changes in intrinsic protein fluorescence, 
higher resistance to Lys endopeptidase and less accessi- 
bility to disulphide bonds by reducing agents. Binding 
of Ca 2+ to the Gla domain, which has about ten-fold 
lower affinity for Ca 2+ compared with the high affinity 
site in the first EGF domain, is required to complete 
the conformational rearrangement involving Gla and 
EGF domains. This rearrangement is necessary for fac- 
tor IX to bind to the membrane surface. As shown for 
protein C, 45 the EGF domain may also affect the con- 
formation and activity of the catalytic subunit of factor 
IX. In addition to the part of Gla domain sequence 
(residues 3-1 1 and 21), 38 the first EGF domain may be 
involved in the binding of factor IX to endothelial cells 
with a high affinity (K d = ~2 ^M). 33 ' 34 The importance 
of the first EGF domain for factor IX function is sup- 
ported by the detrimental effects of many mutations 
found in this domain, 3031 such as factor IX AUbama con- 
taining Asp47 replaced with Gly (10% of the normal 
factor IX activity), 46 factor IX New London containing 
Gln50 replaced with Pro (< 1% activity), 47 and factor 
^Hollywood containing Pro55 replaced with Ala (11% 
activity). 48 Interestingly, once activated, factor IX New 



London shows about 17% of normal activity which is 
comparable with other abnormal factor IX with 
mutations in the first EGF domain. The delayed acti- 
vation of factor IX New London is in part responsible for its 
lowered activity. The replacement of Gln50 with Pro 
may also disrupt factor VIII binding, as observed for 
factor IX AUbanja which shows a reduced effect of factor 
VIII on activation of factor X by factor IX. 46 Replace- 
ment of Pro55 with Ala in factor IX HoHywood was specu- 
lated to disrupt a /3-turn structure required for the 
putative antiparallel /3-sheet conformation of this 
domain. 48 

By swapping domains, Lin etal. 49 found that the first 
EGF domain of factor X can functionally replace that 
of factor IX, but the second (COOH-terminal side) 
EGF domain of factor X cannot assume the function of 
the counterpart of factor IX. These data together with 
those mentioned above suggest that the first EGF 
domain may function as a scaffold for holding the Gla 
region in Ca 2+ -induced conformational rearrange- 
ments. 34 The second EGF domain of factor IX was also 
suggested to be involved in interaction with factor 
VIII. 49 This is further supported by a mutant protein 
which has a replacement of Asn92 with His in the 
second EGF domain. 50 Hertzberg etal. 51 reported that 
the second EGF domain and the protease domain of 
factor Xa in a chimera with the factor IX amino-ter- 
minal half including signal peptide, propeptide, Gla 
domain and the first EGF domain are sufficient to 
interact with factor Va. More recently, the first EGF 
domain was found to be required for factor IX acti- 
vation by factor Vila-tissue factor pathway, but not by 
the factor IXa pathway. It is also essential for optimal 
activation of factor X by factor IXa/factor VHIa/phos- 
pholipid complex, but for neither phospholipid nor 
factor Villa binding to factor IXa (P. Bajaj, personal 
communication). 

Most information on the structure-function 
relationship of the rest of the factor IX molecule, 
including the protease domain (catalytic subunit), has 
come from analysis of a large number of natural 
mutations, particularly missense mutations, which are 
distributed throughout the molecule. 30 The COOH- 
terminal side of the second EGF domain is connected 
to a linking region (aa 129-145). This short sequence is 
not clear except for Cysl32 which is involved in a di- 
sulphide bond with Cys239 of the heavy chain, and 
Argl45 which is involved in one of the two proteolytic 
cleavage sites for activation of factor IX. 52 Mutant fac- 
tor IX molecules with Cysl32 replaced with Arg (fac- 
tor IX Dakar ) 30 and Argl45 replaced with Cys (factor 
IX Cirdiff and others) 53 or His (factor IX^, m] and 
others) 52 support the importance of these residues. 
During proteolytic activation, cleavages of two peptide 
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bonds, one between Argl45 and Alal46 and the other 
between Arg 1 8C and Val 181, release an activation pep- 
tide of 35 ammo acid residues. 54 Structural require- 
ments for the activation peptide appear not to be 
stringent except the immediate neighbouring 
sequences of the proteolytic cleavage sites which are 
involved in specific interactions with the activating 
enzyme. Absence of missense mutations, except the 
above-mentioned, in the regions of the linking and 
activation peptide, agrees with the notion that these 
regions function as spacers and are generally permiss- 
ive for various amino acid sequence changes. 

Mutations found in the catalytic subunit are also 
distributed throughout the domain. Mutations in some 
regions of the catalytic subunit, such as Pro287 
replaced with Leu, Ala291 replaced with Pro, and 
Thr296 replaced with Met, apparently destabilize the 
protein or are detrimental for protein secretion, result- 
ing in low-antigen, low-activity type variants. 30,31 
Mutations in the highly conserved areas such as 
Gly363 to Val, Pro368 to Thr or Gly367 to Arg in the 
immediate neighbourhood of the active site residue 
(Ser365) cause a severe disorder, suggesting that these 
sequences are essential for keeping the active site struc- 
ture functional. For example, factor IX Eaglc Rock , which 
has Val instead of Gly363, is apparently as stable as the 
normal factor IX, but cannot form a correct active site 
conformation because of the side chain of Val. 55 

Haemophilia Bm phenotype is characterized by its 
prolonged partial thromboplastin time and prothrom- 
bin time with ox (but not human) brain tissue factor, 56 
and has mutations in the two distinct regions, an acti- 
vation site area (Argl80-182) and another area (resi- 
dues 390-397). 3031 These mutant factor IX proteins 
likely interfere with the proper binding of factor VII 
(or Vila) to the bovine tissue factor. 

A large number of amino acid residues (about 60% 



of the factor IX sequence) apparently serve just as 
spacer sequences to maintain the overall factor IX pro- 
tein structure, and are replaceable with most other 
amino acid residues with different side chains without 
resulting in haemophilia B. 57 

More recently, a high affinity Ca 2+ binding site 
C^d = ~~500/iM) in the catalytic subunit was reported 
to be possibly involved in binding of factor IXa to 
factor VIII. 58 Carbohydrate chains attached to factor 
IX may play an important role in activation or function 
of factor IX as observed for factor X. 59 

Structure of factor IX gene 

Complementary DNA and gene of human factor IX 
were cloned and their complete nucleotide sequences 
have been determined. 8 * 9 ' 60 The nucleotide numbering 
originally employed for the complete contiguous 
sequence 9 has become the standard system for the gene 
and is used in this article. Factor IX is composed of 
eight exons in a span of about 34 kilobase (kb) pairs 
(Figure 3). 9 The size of the gene, however, may be as 
large as 40 kb depending on unidentified regulatory 
elements located in the 5 ' and/or 3 ' flanking sequences. 
The factor IX gene is located on the X-chromosome at 
q27 in an order of centromere — HPRT at q26 — FIX at 
q27.1 — fragile site at q27.3 — Factor VIII site at 
q28 — telomere. 61 * 63 

Bottema et al. bA reported that the G + C content of 
the factor IX gene (40%) which is in general agreement 
with that of mammalian genomes cannot be explained 
by C to T or G to A transition alone at the CpG sites. 
The mutation rate at CpG sites is elevated about 24- 
and 7.7-fold relative to other transitions and transver- 
sions, respectively. Given the enhanced mutation rates 
at CpG, two-fold and three-fold mutational enhance- 
ment for transitions and transversions, respectively, at 
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Figure 3. Organization of the human factor IX gene and domain structures of factor IX molecule. Exons are shown as solid vertical bars with 
exon numbers. S, P, Gla, H, EGF, L and AP represent signal peptide, propeptide, Gla domain, hydrophobic sequence, EGF-Iike domain, link 
sequence and activation peptide, respectively. Corresponding exons and domains arc shown by lines. Numbers below domain structures 
indicate corresponding amino acid (aa) residue numbers. 

958 Blood Coagulation and Fibrinolysis, Vol 4, 1993 



Biology of factor IX 




™ = 1 1 1 nni 

i ////I/ 

* I II II I 1 1 1 

" \ 

i in in 



20 

KILO 6ASES 



I 

40 



Figure 4. Comparison of gene organizations for factor IX (FIX), 
factor X (FX), factor VII (FVII), protein C (PC), and prothrombin 
(PT). Exons are depicted by solid vertical bars. Corresponding exons 
arc shown by thin lines. Modified from Kurachi and Chen. 83 



non-CpG sites would be sufficient to produce the low 
G + C content of the factor IX gene. The G + C con- 
tent in some other genes such as the protein C gene 
(over 50% ) is significantly higher than that of the factor 
IX gene. Although the precise reason for this difference 
is not known, speculated reasonings may include that 
the specific location of each gene in chromatin which 
may vary in their susceptibility to base changes and/or 
the evolutional pressure to minimize the number of 
mutations in some crucial genes maintaining any func- 
tional changes at minimum. 

The exon-intron organization including splicing 
phases of the factor IX gene are surprisingly similar to 
several other vitamin K dependent coagulation factors, 
indicating a typical divergent evolution involving a 
large number of point and minor mutations after exon 
shuffling events. 9 The overall sizes vary largely among 
these homologous genes due to different sizes of corre- 
sponding introns (Figure 4). Sizes of these genes range 
from 11 kb for the protein C gene 65 to 34 kb for the 
factor IX gene.* Protein S 66 and prothrombin 67 have 
only the first three exons homologous to factor IX. 
These exons encode its prepro leader sequence, Gla 
domain and a short hydrophobic sequence, but the rest 
of the molecules have grossly different structures. Pro- 
thrombin has two kringles and a protease domain with 
different exon-intron organization, while protein S has 
four EGF domains followed by a domain similar to 
steroid hormone binding protein. Protein Z has a simi- 
lar organization to factor IX and other closely related 
factors, although it lacks two crucial amino acid resi- 
dues required for the protease active site formation and 
does not have protease activity.* 8 * 69 The number of 
introns and splicing phases in the catalytic domain of 
factor IX is identical to those of the genes for factor X/° 
factor VIP 1 and protein C/ 5 but distinctly different 



from those for prothrombin which has more introns in 
the domain. 

Eight exons of the factor IX gene encode distinct 
domain structures (Figures 2 and 3). The 5' end 
untranslated (UT) sequence and the signal peptide are 
encoded by the first exon. The entire propeptide 
(except the first Thr residue at - 18 which is coded by 
the first exon) and Gla domain are encoded as one 
genetic unit by the second exon corresponding to their 
coordinated functions in the vitamin K dependent 
y-carboxylation of glutamic acid residues in the Gla 
domain. 9 Other introns are also present at positions 
separating various unique domains. 

Currently, more than ten polymorphic sites have 
been identified in or near the human factor IX gene 
(Table 1). Most of these are identified as changes in 
restriction sites including Bam HI site in the 5' 
immediate flanking region (nucleotide sequence: C or 
T), 72 Hinf 1/ Dde I site in a 50 bp AT -rich insert in the 
first intron, 73 Xmn I in the third intron (G or C), 73 Taq I 
site in the fourth intron (C or T)/ 4 and Mnl I site in the 
sixth intron (originally identified as Thr/ Ala dimor- 
phism at aa 148). 75 A highly polymorphic site (A > G) 
in Japanese population was found in the first intron. 76 
An intragenic Bam HI polymorphic site located at 
- 500 bp 5 ' to the Xmn I polymorphic site in intron 3 in 
about 50% of the African American population. 77 Msp 
I polymorphism in strong disequilibrium with the Taq 
I polymorphism in the fourth intron was also 
reported. 78 Most intragenic polymorphic sites are in 
strong linkage disequilibrium, and these polymorphic 
alleles co-segregate in 70-80% of Caucasian factor IX 
genes. The Bam HI polymorphic site in the 5' end 
region and Hha I polymorphism found at 8 kb down- 
stream in the 3' flanking sequence 79 - 80 are in equilib- 
rium. Extragenic polymorphic Sst I site (detected with 
DXS99) and Taq I site (detected with DXS102) in the 5' 
upstream flanking sequence and the factor IX gene 
may have a 3-5% chance of recombination due to 
cross-over events in meioses. 8182 Linkage disequilib- 
rium among the intragenic polymorphic sites and the 
possibility of recombination for the extragenic sites 
significantly hamper the usage of these for carrier 
detection and prenatal diagnosis. The overall useful- 
ness of these polymorphisms in carrier detection is 
about 90% for whites and blacks. Frequencies of these 
polymorphisms vary largely among ethnic groups. 83 
Except for the A/G polymorphic site in the first 
intron, 76 all intragenic restriction fragment-length 
polymorphisms which are present in Caucasians and 
African Americans are absent or extremely rare in 
Asians. 83 

Besides the polymorphic AT-rich sequence in the 
first intron, tandem purine and pyrimidine dinucleo- 
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Table 1. Allele frequencies of polymorphisms of human factor IX gene in various populations 1 



Restriction 


Position 


Allele 
(kb) 


Caucasian 


African American 


Asian 


enzymes 




No. of 
chromosomes 


Frequency 


No. of 
chromosomes 


Frequency 


No. of 
chromosomes 


Frequency 


Sst 1 


5' extragenic 


6 


76 


0.48 


38 


0.53 


63 


0.5 






9 


81 


0.52 


34 


0.47 


63 


0.5 


Taq\ 


5' extragenic 


10 
1.2 


18 
2 


0.90 
0.10 










Bam HI 


5' extragenic 


25 


90 


0.94 


50 


0.64 


100 


1.0 






23 


6 


0.06 


28 


0.36 


0 


0.0 


Hinf I/Ddel 


intron 1 


1.75 


10 


0.21 


14 


0.36 


0 


0.0 






1.70 


36 


0.79 


26 


0.64 


30 


1.0 


Bam HI b 


intron 3 


15 


32 


1.0 


11 


0.52 










13 


0 


0.0 


10 


0.48 






Xmn I 


intron 3 


11.5 


42 


0.7 


12 


1.0 


76 


1.0 






6.5 


16 


0.3 


0 


0.0 


0 


0.0 


Taq\ 


intron 4 


1.8 


285 


0.68 


15 


0.9 


61 


1.0 






1.4 


129 


0.32 


2 


0.1 


0 


0.0 


Mspl 


intron 4 


2.4 


40 


0.8 


22 


0.4 


57 


1.0 






5.8 


10 


0.2 


31 


0.6 


0 


0.0 


Mnl V 


exon 6 


(Ala) 


10 


0.29 


8 


0.12 


0 


0.0 






(Thr) 


25 


0.71 


60 


0.88 


95 


1.0 


Hhal 


3' extragenic 


0.2 


13 


0.38 


11 


0.33 


43 


0.83 






0.15 


21 


0.62 


22 


0.67 


9 


0.17 


purine/ 




I 


4 


0.29 






0 


0.00 


pyrimidine 




II 


10 


0.71 






13 


0.93 


polymorphism d 




III 


0 


0.00 






1 


0.07 



* The data summarized in this table is a composite of data reported in papers including Hay et al. , ;z Winship et al. / 3 Camerino et al. , H7S Driscoll 
et aL, 77 Reiner et al., 90 Mulligan et aL, tx Hofker et aL> u Kurachi et d/., 83 Sarkar et al. 94 and personal communications (S.-H. Chen). 
b This 2nd Bam HI site is located at 500 bp 5' to the Xmn I polymorphic site and was detected by a Bam Hl/Spb I digest. 
c This is known as Thr/Ala-148 dimorphism which codes either for Thr or Ala. 

d Allele I, (GT) 4 ATGC(GT),AG(AC) 4 GCAT(AC) 3 ; Allele II, (GT) 5 TGC(GT) 4 AG(AC) 4 GCAT(AC) 3 ; Allele III, (GT) 5 ATGC(GT) 4 
AG(AC),GCAT(AC) 3 . 



tide repeats which are polymorphic in most human 
races are also present in the 3 ' UT sequence in the form 
of four different alleles. 84 These polymorphisms 
further improve carrier determination and prenatal 
diagnosis of haemophilia B, 

The human factor IX gene contains many repetitive 
sequences such as Alu sequence and long interspersed 
element (Kpn I repetitive sequence, abbreviated as 
Line-1 or L-l). Five Alu sequences are present in 
introns and in the 3' end immediate flanking region, 
while two Line-1 sequences, one partial element and 
one 6.1 kb complete element, are present in the fourth 
intron and in the 5' flanking region, respectively. 9 
Novel, short, interspersed repeat sequence (Ano) is 
also present in the first intron. 85 

Abnormal factor IX genes 

To date, more than 600 haemophilia B patients have 
been studied for molecular defects in their factor IX 
genes. 30 Of 574 patient entries in the 1992 database 
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(Third edition), 278 (48%) are unique and the rest are 
repeats which may be due to independent mutations or 
founder effects. 

Mechanisms of mutation found in haemophilia B 
genes are highly heterogeneous, including at least 29 
cases of complete or partial gene deletions or more 
complicated gene rearrangements, 50 short (less than 
20 nucleotide) deletions or insertions or both, and a 
large number of single-base mutations which include 
524 cases of missense, nonsense mutations, and 
mutations at splice sites as well as in the 5' UTR and 
flanking sequence. 30 ' 31 Gross and relatively large gene 
deletions, insertions and rearrangements, which can be 
rapidly detected by Southern blot analysis or polyme- 
rase chain reaction (PCR) as missing or rearranged 
DNA fragments, account for only about 4% of all 
mutations. Some of the gross gene deletions may be 
parts of much larger deletions which may span more 
than 500 kb in size. 3 ' All patients with gross gene dele- 
tions are severely affected. However, only two-thirds 
of those patients have developed alloantibodies 
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(inhibitors) against human factor IX infused during 
replacement therapy. Some patients with detectable 
factor IX antigens also develop alloantibody. These 
observations indicate that the development of inhibi- 
tors against the normal factor IX infused in the protein 
replacement therapy is primarily due to secondary fac- 
tors such as treatment regimens and/or polymor- 
phisms in the immune response system of individuals, 
but is not simply due to missing the entire factor IX 
antigen or to specific epitopes resulting from gross or 
partial gene deletions. 

No obvious hot spots for deletion breaking points 
have been observed. However, factor IX Scattle x which 
has about a 1 0 kb deletion spanning introns 4 through 6 
has been shown to involve the 14 bp sequence 
(TAGAAGTTCACTT) duplicated lOkb apart in 
introns 4 and 6. 86 In some cases, well-known repetitive 
sequences such as Alu sequences (highly abundant 
repetitive sequence of about 300 bp in length) are 
apparently involved. 87 An interesting mutant gene with 
an insertion is factor IX tt Sa | vador . 88 This has an insertion 
of about 6kb extra sequence (which is likely an L-l 
element) in the fourth intron within the 0.8 kb which 
spans between the 3' end of exon 4 and the first Eco Rl 
site in this intron. Whether or not the insert sequence is 
directly responsible for this haemophilia B is yet to be 
determined. Line-1 element has a complete transcrip- 
tional unit with two open reading frames including 
retroviral reverse transcriptase-like sequence. 88 Line-1 
insertion in exon 14 of the factor VIII gene has been 
reported as a novel mutational mechanism. 89 Line-1 
sequence inserted in an intron may possibly generate 
an extra splicing set of sequences causing abnormal 
processing of the factor VIII gene in a mild haemo- 
philiac. 90 Otherwise, Line-1 element insertion in 
introns may not be deleterious, and haemophilia in this 
type of kindred is incidental, probably due to an 
unidentified second mutation somewhere else in the 
gene. 

About 40 mutant factor IX genes with small dele- 
tions such as one, two, three, four, seven and more than 
13 bases in size have been found distributed in exons as 
well as in introns. 30 - 31 No specific hot spots for these 
small deletion mutations have been observed. Point 
mutations found in abnormal factor IX genes are 
distributed throughout the factor IX molecule, sug- 
gesting that the entire structure of the factor IX protein 
is highly optimized and that almost every part of the 
molecule is essential for maintaining its overall struc- 
ture and/or specific function. A large part of the 
sequence is estimated to serve as spacers for maintain- 
ing the overall globular structure, as mentioned 
above. 57 This may be different from factor VIII with its 
dispensable, large central B domain. Currently, no 



function, other than its function as a large spacer (acti- 
vation peptide), has been identified for the B domain of 
factor VIII. 91 - 92 

Missense mutations account for the majority (70%) 
of the point mutations in abnormal factor IX genes, 
while nonsense mutations account for about 16%. 30 
These mutations, particularly missense mutations in 
factor IX genes, result in subtle changes of factor IX 
structure causing a wide spectrum of clinical severity 
and providing us with insights into the structure-func- 
tion relationship of factor IX. Because of its relatively 
small size, availability of a large number of mutant 
genes with missense mutations and the complete gene 
structure, factor IX may be the most amenable protein 
among all coagulation factors for the exhaustive study 
of structure-function relationships. 

Mechanisms responsible for the point mutations 
found in the factor IX gene are highly heterogeneous. 
Among them, CpG dinucleotide sequence has been 
clearly recognized as a mutational hot spot. 30 * 31 Eight- 
een (or 36 for the double strands of DNA) CpG 
sequences are present in the coding region, which is 
only a quarter of the possible random dinucleotide 
sequences. Endogenous methylase converts some of 
deoxycytidine of the CpG sequences to 5-methyIdeox- 
ycytidine which is then spontaneously converted to 
deoxythymidine by deamination. Because no cellular 
repair mechanisms are present for this conversion, the 
rate of mutation at CpG in factor IX genes is elevated at 
least 24-fold and 7.7-fold for transition and transver- 
sion, respectively, over the other random mutations at 
non-CpG sequences. 64 ' 93,94 Interestingly, the lack of 
repair mechanism for this alone cannot account for the 
reduced frequency of CpG sequences in the gene. 64 
About 45% of all unique point mutations found in fac- 
tor IX genes are due to mutations at CpG sites. Some of 
these mutations observed may be due to founder 
effects. Data obtained on factor IX genes agree well 
with those for other genes such as the factor VIII 
gene. 95 Twelve CpG sites in the factor IX gene have 
base mutations once or multiple times. Mutations at 
CpG sites such as Thr296 replaced with Met are dupli- 
cated in unrelated families, further supporting these 
CpG sites as hot spots. 96 Some CpG sites do not have 
any mutations reported. The underlying mechanism 
for this is not known. It is possible, however, that these 
CpG sites are somehow inaccessible to methylase in 
the chromatin structure. Multiple different mutations 
at the same nucleotide sequences of CpG or at non- 
CpG sites have also occurred. 30,31 Some examples 
include replacements of nt -6G (5' UTR) by A or C, nt 
+ 13A (5' UTR) by G or C, or deletion, nt 6 365G in a 
codon for Arg-4 by T or A, nt 6 704T (splice site) by G 
or C, nt 10419G in a codon for Cys56 by C or A, 
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20524G in a codon for Vall82 by C or T, nt 20566G 
(splice site) by T or A, nt 30 992G in a codon for Ala29 1 
by C or A and nt 3 1 290C in a codon for Ala390 by T or 
A. These data further support the concept that the 
mechanisms responsible for the mutations are highly 
heterogeneous, and demonstrate that the factor IX 
structure has been extensively tested and refined by 
mutational events in the process of evolution. 

An interesting case of somatic mosaicism of abnor- 
mal factor IX with a mutation (Cys350 to Ser due to G 
to C change) has been reported. 97 In a family affected 
with haemophilia B, a male member was very mildly 
affected (35% factor IX activity with 45% antigen level 
of normal). In this haemophilia B kindred, the two 
female members (daughter and granddaughter of the 
affected male) were moderately affected (3% activity 
and 4% antigen of the normal factor IX level). Factor X 
and factor VII levels as well as prothrombin time were 
normal. Among somatic tissues of the affected male 
analysed, about 10% of the total cells of both leuco- 
cytes and liver have the normal gene while about 90% 
of the cells have the mutant gene. However, the cells of 
kidney and smooth muscle have both normal and 
mutant genes about equally. These results indicate a 
somatic mosaicism, probably due to a replication or 
post-replication repair error during the first mitotic 
divisions in the zygote preceding implantation, or a 
half-chromatid mutation generated during meiosis 



that was not corrected before fertilization. In the leu- 
cocytes of the two female patients, both normal and 
mutant genes are present in an equal amount. These 
data suggest a possibility that not only liver but also 
leucocytes are of endoderm origin, which is contrary to 
the commonly held mesodermal origin for leucocytes. 
A rare case of severe haemophilia B in a girl due to non- 
random inactivation of a normal factor IX gene has also 
been reported. 98 

A class of abnormal genes which have mutations in 
the Lcyden-specific region (LS-region, arbitrarily 
defined as a region from about nt -40 to +20) belong 
to the haemophilia B-Lcyden phenotype, which shows 
a unique delay of the factor IX expression until the 
onset of puberty. Eighteen mutations have been found 
in the 5' end region of the factor IX gene. Among these, 
one is located at nt - 793 in the 5 ' upstream, and the rest 
(17 mutations found) account for twelve unique 
mutations in the LS-region. Eleven of these mutations 
result in the Leyden phenotype. 

Regulation of the factor IX gene 
expression 

The factor IX gene is expressed in liver with a high tis- 
sue specificity." Illegitimate expression in other tissues 
is also observed. 100 

During most of the gestational period, the factor IX 
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Figure 5. Steady-state liver mRNA levels for mouse factor IX at various developmental stages. Factor IX mRNA levels 
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Figure 6. Analysis of the 5' end flanking sequence of the factor IX gene for promoter activity. Expression activities of 
CAT constructs containing various portions of the 5' end flanking sequence were assayed in HepG2 cells and Hepl-6 
cells." The 5' end sequence of the factor IX gene is shown with solid lines with the CAT gene shown as an open box at the 
3' end. CAT activities relative to that of p-416/4CAT are shown in percentages. The thin lines connected in the middle 
indicate deleted areas. The 5' and 3' ends of the factor IX gene sequence contained in each CAT construct are shown by 
two numbers separated by a slash in the labellings. 9 p-416/29CAT, which contains a factor IX sequence extended to nt 
+ 29 at the 3' end, shows identical expression activity to that of p-416/4CAT. Deletions of the 3' end sequence from nt 
+ 29 up to -2 in p-4 1 6/29C AT did not effect expression activities of CAT constructs. However, further deletions beyond 
nt -3 position dramatically reduced the activity. 



gene is expressed only at a low level (3-5% of the adult 
level) until the late stage of the third trimester. This was 
shown for humans, with limited data, 101 or lamb 102 and 
more completely for mice. 103 The developmental time 
curve of expression of the factor IX gene in mouse liver 
shows an induction of a high-level expression of the 
factor IX gene on day 18 of gestation (late stage of the 
third trimester) (Figure 5). The increased expression of 
the gene continues through birth followed by a rather 
gradual increase until reaching the adult level at wean- 
ing (20-24 days of age). At birth, the factor IX mRNA 
level is only 43 % of the adult level, and the plasma fac- 
tor IX activity level agrees well with the mRNA level. 
These results agree well with the limited data available 
on humans. 104105 The low level of factor IX mRNA at 
birth may be responsible in part for haemorrhagic dis- 
orders in pre-term or term neonates. This condition, 



however, may be aggravated by generally poor vitamin 
K synthesis in neonates and, furthermore, if antibiotics 
or vitamin K analogues such as warfarin are given to the 
mother during the prenatal stage. 106 Other pathological 
conditions such as diarrhoea and cystic fibrosis also 
lower the vitamin K level, resulting in secondary hae- 
morrhagic diseases. 107 

As in any other gene, the 5' end region of the factor 
IX gene contains various short sequences which func- 
tion as cis -acting elements in its regulation." Systematic 
analyses of these elements in the 5' end region have 
been carried out with expression vector constructs 
containing variously deleted 5 ' end region of the factor 
IX gene ligated to chloramphenicol acetyltransferase 
(CAT) gene as a reporter (Figure 6)." The schematic 
drawing of the overall organization of the major func- 
tional elements identified is shown in Figure 7. The 
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Figure 7. Locations of major functional elements in the 5' end 
region of the factor IX gene. Nucleotide numbering system is from 
Yoshitake et al* The nucleotide sequence of the silencer is of the 
complementary strand. 

fundamental elements necessary for a high-level 
expression of the factor IX gene are contained in 
approximately the first 300 bp sequences of the 5' end 
region (Figure 7). As more 5' upstream sequences 
beyond nt -400 region are included into the CAT con- 
structs, lower expression activities are observed (Fig- 
ure 6). The sequence including up to about nt —1900 
shows only a low-level activity (16% of the construct 
with a sequence up to nt —416). Even lower expression 
(less than 3%) is observed when the 5' upstream 



sequence up to —6.9 kb is included in the expression 
vector. The factor IX gene does not have a typical 
TATA sequence in the 5' end flanking sequence. How- 
ever, according to the functional analysis data, 
sequences TCAAAT starting at nt —187 and AGC- 
CACT starting at nt —238 have been tentatively ident- 
ified as functional TATA box and CCAAT box, 
respectively. AGCCACT agrees well with the consen- 
sus CCAAT sequence. 108 

The locations of fundamental transcriptional 
elements of the factor IX gene agree well with the tran- 
scriptional start site placed in the region of nt — 150 for 
CAT constructs" which was revised from the previ- 
ously assigned site ( + 1 site). 109,110 The primary tran- 
scription start site in liver was determined to be at nt 
-176 in the 5' upstream by primer extension and 
DNase I protection analyses with high-quality poly 
(A) + RNA preparations of human livers. 111,112 Reverse 
transcription-mediated PCR with primers at or down- 
stream of nt — 1 76, but not with one at nt —300 region 
where no signals for transcription initiation were 
observed, can amplify products, further supporting the 



-300 CATTGCTCTCTGACAAAGATACGGTGGGTCCCACTGATGAACTGTGCTGC 



-250 CACAGTAAATGTAGCCACTATGCCTATCTCC ATTCTGAAGATGTGTCAC T 

CCAAT box C/EBP 

D3P 

-200 TCCTGTTTCAGACT^AAATCAGCCACAGTGGCAGAAGCCCACGAAATCAG 

TATA box 



- 1 50 AGGTGAAATTTAATAATGACCACTGCCCATTCTCTTCACTTGTCCCAAGA 



-100 G GCCATTGGAAATAGTCCAAAGACC CATTGAG GGAGATGGACATTATTTC 

NIF-1 

C 3 A/C A/C T 

11/ \ I 

-50 CCAGAAGTAAATAC AGCTCAGCTTGT ACTTTGGT AC AACTAATCGACCTT 

-26 -20 -5 



G/C/- 



+ 1 ACCACTTTCACAATCTGCTAGCAAAGGTTATGCAGCGCGTGAACATGATC 
$_J_ ^ Met Gin Arg val Asn Met Me 

C/EBP 



Figure 8. The nucleotide sequence of the 5' end region (300 bp) with various elements. The numbers on the left indicate the old nucleotide 
numbering.'' The revised primary transcription start site in liver is shown with an asterisk. Part of the signal peptide starting with the first Met 
residue at aa - 46 is shown at the bottom line. Solid underlines indicate protein binding regions. Dotted underlines indicate tentatively identified 
functional CCAAT and TATA boxes. The LS-region is arbitrarily defined as the region roughly spanning nt - 40 to 4- 20. Mutations found in 
haemophilia B Lrvdfn genes at nt —20, -6,-5, +6, +8 and + 1 3 are shown with short vertical bars with mutant sequences above the LS-region. The 
mutation at nt —26 (G to C change) does not show a Leyden phenotype. 
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initiation site in liver. A dog factor IX cDNA which 
has the 5' extension at least up to - 179 is also in good 
agreement with the revised 5 ' upstream start site for the 
human gene. 113 The previously observed transcription 
initiation site ( + 1 ) is likely a secondary start site in liver 
or an artifact, probably due to poor quality of the RNA 
preparations used. The site, which is located in the 
middle of a region designated as LS-region, could be in 
a unique secondary structure which makes factor IX 
mRNA highly susceptible to degradation at this site. 
Crossley and Brownlee 114 - 115 reported functional analy- 
ses of the 5' end promoter region for transcriptional 
activity using a CAT construct containing a 5' end 
sequence up to nt -189 as a control for the full CAT 
activity. This construct, which contains the factor IX 
sequence up to nt - 1 89, however, has only about 15% 
of the optimal constructs which contain a 5' sequence 
up to nt -300." 

Structural elements homologous to the known liver- 
specific enhancers are also present in the 5' end pro- 
moter region. These include TGGACC (partial LF-Al 
or HNF-4 element) at nt —359 in sense strand and 
CTTTGGACT (PRI element) at nt -79 in antisense 
strand, which are also present in other genes such as 
a, -antitrypsin, transferrin and antithrombin III 
genes." The region, nt -76 through -99, containing 
CTTTGGACT has been reported to bind NF-1 114 
which is originally identified as a liver-specific 
enhancer protein. 116 ,17 

Several negative regulatory elements (silencers) 
which are identical or similar to those found in other 
genes are present in the region of about nt -700 
through —2000. These elements are responsible for the 
activity reduction observed for CAT constructs which 
contain various portions of sequence in this region." 
Among them, a sequence spanning — 1 .4 kb to — 1 .7 kb 
contains two sequence elements, ATCCTCTCC start- 
ing at nt - 1 680 and C A ATGGTT at nt - 1 62 1 , which 
are similar to the well-characterized consensus silencer 
elements (negative regulatory elements). 99118 When a 
sequence containing these elements was placed down- 
stream of a CAT gene at the Bam HI site in p-416/ 
+ 4CAT which contains the factor IX promoter 
sequence (nt +4 through -416), both orientations of 
this sequence (sense and reverse orientation) reduced 
the expression activity to 21-26% of p-416/ + 4CAT." 
These results indicate that silencer elements in the 
region are actually functional in the factor IX gene. 
More silencer-like elements found in the 5' end region 
include ACCTATGGAA starting at nt -726, 
CTGAATGGCT at nt -793 and CAATGACT at nt 

— 1467. Interestingly, a very strong promoter activity 
in a reverse direction is present in a region spanning nt 

- 700 to - 750. 99 The sequence elements responsible for 



this reverse direction promoter are currently not 
known. No retroviral LTR-like sequences are present 
in this region. The presence of the reverse promoter 
region appears to coincide with a significant reduction 
(60-70%) of the normal expression activity of the fac- 
tor IX gene (Figure 6)." 

Important information on the regulation of the fac- 
tor IX gene has also been obtained from transgenic 
mice experiments. Jallat et al. 119 have recently con- 
structed transgenic mice carrying factor IX minigenes 
with the 5 kb sequence of the 5 ' end immediate flanking 
region containing the promoter elements in addition to 
all the silencer elements detected in the in vitro assay 
and variously shortened intron sequences. Their 
results have clearly indicated that the liver-specific 
high expression of the factor IX gene can be achieved 
by various constructs with the 5kb 5' end flanking 
sequence as the promoter and the partial sequence of 
the first intron. A factor IX cDNA construct (contain- 
ing no intron sequences) with the same 5kb sequence 
of the 5' end flanking region shows only a background 
level expression in transgenic mice. These observations 
strongly suggest that at least one intron as a set of splic- 
ing sequences or a putative enhancer element(s) which 
may be present in the first intron must be responsible 
for obviating the silencer activity in the 5' end 
upstream sequence. 

The data obtained from both in vitro and in vivo 
experiments indicate several important points, includ- 
ing: (i) high-level expression of the factor IX gene can 
be achieved by the elements contained within the 
sequence up to about nt -300, (ii) this expression 
activity is efficiently suppressed by multiple silencers 
present in the 5' upstream region, and (iii) the reduced 
activity may be restored to a high level in vivo in the 
presence of the first intron partial sequence. The obvi- 
ation of the silencer activity in the 5' upstream 
sequence by the first intron sequence, therefore, 
appears to be a key mechanism underlying the overall 
regulation of the factor IX gene. 

Important observations regarding the developmen- 
tal regulation of the factor IX gene have been obtained 
from a unique class of haemophilia B, haemophilia 
B-Leyden. 12C ~ 122 While the normal factor IX gene is 
induced for its high-level expression at the perinatal 
stage, 103 Leyden phenotype factor IX genes are not 
expressed or not induced for their high expression until 
the onset of puberty. 120 Eleven unique single-base 
mutations so far found in haemophilia B-Leyden fam- 
ilies include nt -21 (Tto G), -20(T to A or C), -6(G to 
A or C), -5(A to T), +6(T to A), +8(T to C) and 
+ 13(A to G or C or deletion). 30121129 Without any 
exceptions, all these mutations are contained in the LS- 
region (roughly from nt - 40 to + 20) in the 5 ' untrans- 
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Figure 9. Protein binding at the LS-region and its neighbouring region. The human factor IX sequence is shown by a thick solid line. Proteins 
which bind to the region as evidenced by footpnntmg analyses are shown by circles. A circle with thin line indicates the protein binding in the 
factor IX-Leyden gene with a mutation at nt - 20. Numbers in circles indicate the nucleotide residue locations of the natural mutations found in 
the Leyden phenotype factor IX genes. Numbers below circles indicate actual sequence regions of footprints. Known proteins are also shown 
below the numbers. An arrow indicates the primary transcription site in liver. 



lated sequence of the factor IX gene (Figure 8). With 
the revised primary transcription initiation site of the 
factor IX gene,"> ni the LS-region is located within the 
5' untranslated sequence. When mutations found at 
nt -20(T to A), -6(G to A) and + 13(A to G or dele- 
tion) were tested for their effects using CAT 
expression vectors, all mutant sequences substantially 
reduced the expression activity of these constructs to 
15-30% levels of the normal sequence, 112 - 122 indicating 
the importance of these sequences for the factor IX 
activity. As found by DNase I footprinting analyses, 
the normal LS-region and its neighbouring region bind 
five proteins 111 - 130 (schematically shown in Figure 9). 
These include three apparently new proteins, shown as 
footprints FP-II, FP-III' and FP-IV, 111 - 130 in addition to 
recently reported HNF-1, C/EBP and HNF-4. 114 ' 113 ' 129 
In the 5' side neighbouring region, NF-1 and an 
unidentified new protein bind at a region spanning nt 
-99 to -76, and at a region spanning nt -67 to -44 
(FP-IV), respectively. 130 Neither one of these new pro- 
teins is glucocorticoid receptor, nor androgen recep- 
tor. 99111 Binding of C/EBP, which is present in most 
differentiated cells at significant levels, to a region 
spanning nt +3 to +19, is easily detected by both 
DNase 1 footprinting analysis and electrophoretic 
mobility shift analysis. 111114 ' 130 Recently, the protein 
binding to -20 region (FP-III, nt -17 to -27) was 
determined to be HNF-4, a protein of the steroid 
receptor superfamily. 115129 HNF-4 binding to this 
region of the normal factor IX gene prohibits binding 
of another protein which binds to an overlapped 
androgen responsive element-like sequence (FP-IIF, 
nt -36 to -22). in ' 130 This protein, however, can com- 
petitively bind to the region of the factor IX gene if the 
gene has a mutation at nt -20 which causes a gross 
decrease in the binding affinity of HNF-4 to the 
region. This mutation-dependent competition be- 
tween the two overlapped regions for protein binding 
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was shown by both gel mobility shift assay and DNase 
I footprinting analysis. 111 - 130 The importance of HNF-4 
binding to the region and the unidentified protein 
which binds to FP-III ' has been further supported by a 
drastic decrease in the expression activity of the factor 
IX gene with a mutation at nt -26. 115 ' 129 Although 
Crossley et al. U5 reported that the protein which binds 
to the FP-IIF region is androgen receptor (AR), careful 
protein binding analyses have shown that it is not 
androgen receptor, but an ubiquitous protein present 
in nuclear extracts of liver as well as cultured cells 
which are not only androgen receptor-positive, such as 
T47D, LNCaP and HepG2, but also androgen recep- 
tor-negative cells such as CV1 and COS cells. 111 - 130 Glu- 
cocorticoid receptor does not bind to this region in 
good agreement with the results from the expression 
assay. When the Leyden phenotype mutations at nt 
+ 13, -6 or -20 are present in oligonucleotide 
sequences (double-stranded form) used in the electro- 
phoretic mobility shift assay, the binding affinity of 
these proteins to the oligonucleotides is grossly 
decreased, agreeing well with reduced expression 
activities observed for the mutant sequences. 111130 
Interestingly, the 3' half of the LS -region, where 
C/EBP binds at + 13 subregion (FP-I), and the 5' half, 
where two proteins (HNF-4 and an unidentified pro- 
tein) bind in the region containing -20 and -6 sub- 
regions (FP-III and FP-II), apparently function with 
little cooperation. 122 The 3' half, for instance, where 
C/EBP binds, may require a second unidentified 
element in the 5' upstream which is not included in the 
CAT construct used in the assay. In this regard, Pick- 
etts et al 131 recently reported that DBP interaction with 
C/EBP which binds at nt -202 to -219 region may 
synergistically confer its enhancer activity on a factor 
IX-Leyden gene and is responsible for the amelior- 
ation of haemophilia B-Leyden with a mutation at nt 
-5. DBP is induced for its expression in adulthood, but 
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not in childhood. This is an attractive mechanism to 
explain, at least in part, the Leyden phenotype. How- 
ever, it has difficulties in explaining some important 
aspects of the Leyden phenotype. These include: (1 ) If 
DBP binding at the 5' upstream can override the 
defects in the LS-region, why does DBP not ameliorate 
the mutation at nt -26, which is in the LS-region, after 
puberty? (2) Why does the normal factor IX gene 
which has the normal LS-region not significantly elev- 
ate its expression level after puberty as DBP increases 
its level? Whether or not DBP can selectively interact 
with any proteins, including C/EBP which binds at the 
3' half of the LS-region, is not known. 

The LS-region of the transcript of the factor IX gene 
may assume some secondary structures such as stem 
loop structures, albeit not extensive, as predicted from 
the sequence. 122 The functional significance of stem 
loop structures in the untranslated region of transcripts 
has been well documented for other genes such as Tar 
element in HIV-1 or iron responsive elements in ferri- 
tin and transferrin genes. 132-134 A possible involvement 
of these unique structures of the LS-region in its func- 
tion remains to be determined. 

Development of alternative therapies for 
haemophilia B 

Currently, haemophilia B is treated by plasma protein 
replacement therapy. 1 " This therapy is effective, but 
exposes patients to possible risks of serious side-effects 
and complications such as contracting blood-borne 
viruses including hepatitis and HIV-1 viruses, throm- 
bosis due to other coagulation factor contaminations, 
and inhibitor (alloantibody) development. 136 A large 
number of haemophilia patients (70-90%) who have 
received repeated plasma protein replacement therapy 
are already infected with HIV-1 viruses. In addition, 
frequent transfusions of factor IX preparation required 
in the therapy for severely affected patients are highly 
costly and significantly impair the quality of life of 
patients. 

Large-scale production of recombinant human fac- 
tor IX for safer protein replacement therapy by cul- 
tured mammalian cells is currently hampered by 
complicated post-translational modifications, such as 
y-carboxylation, required for normal factor IX func- 
tion. 25 Poor efficiency of such modification by cultured 
cells has been a serious problem in producing in quan- 
tity recombinant factor IX with a high specific activity. 
Unexpectedly, coexpression of cloned ^-carboxylase 
and factor IX cDNA did not improve y-carboxylation 
of recombinant factor IX. 23 Information obtained from 
these studies, however, should eventually help to pre- 
pare recombinant mammalian cells which can express 



fully carboxy lated factor IX. Such cells may be success- 
fully used to produce much safer recombinant factor 
IX in quantity to substitute the plasma factor IX prep- 
arations currently in use. 

A novel approach for an alternative haemophilia 
therapy includes somatic gene therapy. 137 138 This 
approach requires an ex vivo or in vivo transfer of the 
normal human factor IX gene (factor IX minigene con- 
structed with the cDNA are widely used) into a target 
tissue of a patient, such as liver where the factor IX gene 
is normally expressed, or other tissues which can sup- 
port long-term production of biologically active 
recombinant factor IX without any deleterious effects. 
If such an approach is developed, it may be able to 
obviate several serious side-effects of the current 
plasma replacement therapy. 

Several cell types including rodent and haemophilic 
dog skin fibroblasts,' 39 - 141 endothelial cells, 142 liver 
hepatocytes, 143 skeletal muscle cells, 144 and keratino- 
cytes 145 have been tested for their ability to produce 
biologically active recombinant factor IX in culture. 
Reported recombinant factor IX preparations pro- 
duced in these approaches have varied in their specific 
activities (-70-100% of the plasma factor IX). The 
variations are, in part, due to artifacts of the methods 
used to quantitate the recombinant factor IX secreted 
into medium. This problem, however, was recently 
solved by introduction of a simple pretreating pro- 
cedure using serum with barium sulphate. 146 

When genetically modified skin fibroblasts were 
implanted in dermis or subcutaneously in mice or 
rats, recombinant factor IX was transiently expressed. 
Palmer et al. uo reported that by using recombinant 
retroviruses containing cytomegalovirus promoter or 
retroviral long terminal repeat promoter, the recombi- 
nant human factor IX was produced at very high 
levels (—3.4 or 1.6/xg/10 6 cells/day in normal human 
diploid fibroblasts or in normal rat diploid fibroblasts) 
in culture. When these genetically modified cells were 
implanted into nude mice or rats, transient systemic 
levels of recombinant factor IX reached 0.18/ig and 
0.022 /xg/ml plasma, respectively. St Louis and 
Verma 139 originally reported systemic delivery of 
recombinant factor IX at a transient level of 
— 0.1 jug/ml serum in mice by implanting genetically 
modified mouse skin fibroblasts embedded in collagen 
under epidermis. A very inefficient systemic delivery 
of the produced recombinant factor IX (2-6%) as well 
as promoter inactivation and poor stability of the 
promoter used, were observed. These problems 
obscured the advantage of using skin fibroblasts for 
this purpose. Scharfmann et al.," 7 however, reported 
that use of housekeeping gene promoters such as 
dihydrophorate reductase may overcome some of 
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these problems. Recently, human applications of skin 
fibroblast gene therapy have been reported from 
China. 148 The MoMLV retroviral expression vector 
with its LTR as the promoter was used in these 
applications. One of the two mildly affected patients 
who received the therapy has shown a limited, transi- 
ent improvement over several months. This approach 
still needs a substantial amount of systematic testing 
for its efficacy and safety before being applied to 
humans in this country. 

Fully active recombinant human factor IX can be 
produced by rat capillary endothelial cells in culture at 
a level of 0.84ju,g/10 6 cells/day. 149 A brief account of 
factor IX production by bovine adrenocortical endo- 
thelial cells is also reported. 141 These results indicate 
that endothelial cells have all the basic properties 
necessary to serve as a drug delivery vehicle for pro- 
ducing recombinant factor IX. 

Skeletal myoblasts as an efficient gene transfer 
vehicle to obtain a high-level production of recombi- 
nant factor IX in the systemic circulation (~ 1 /^g/ml in 
C3H mice) have been described. 143 ' 145 Several important 
findings in this series of work include: (1) skeletal mus- 
cle cells can efficiently express foreign genes including 
factor IX at a high level; (2) skeletal muscle cells have 
mechanisms for post-translational modification pro- 
ducing human factor IX with a very high specific 
activity (81-90%); (3) efficiency of the systemic deliv- 
ery of recombinant factor IX by muscles is surprisingly 
high (5 s 29%); (4) long-term expression in vivo can be 
achieved; (5) intramuscular implanted myoblasts can, 
not only fuse to host myofibres, but also survive as 
quiescent muscle precursor cells (muscle stem cells, 
presumably as satellite cells), further supporting the 
rationale to utilize myoblast-mediated gene transfer 
for developing a long-term stable gene therapy for hae- 
mophilia B. Extensive efforts targeting liver are in pro- 
gress in multiple laboratories. Expression of human 
factor IX (0.071 ^g/10 6 cells/day) by rabbit hepato- 
cytes using retroviral vector containing cytomegalovi- 
rus promoter has been reported. 143 Ponder et al. ]5 ° 
recently reported that implantation of bacterial 
/3-galactosidase gene-tagged hepatocytes obtained 
from transgenic mice (C57BL/6) by intrasplenic injec- 
tion resulted in deposition and long-term survival (> 6 
months) of the transplanted cells in parenchyma, 
which amounts to 0.5% of the entire liver. A high-level 
systemic delivery of human or, antitrypsin (~5/xg/ml 
plasma) in a similar approach was also observed, 151 
strongly suggesting that this approach may be feasible 
for developing somatic gene therapy for haemophilia 
B. The direct factor IX gene transfer into rat liver by 
receptor-mediated gene transfer has shown a transient 
expression of biologically active factor IX into circu- 
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lation. 152 The short-lived expression of factor IX 
observed, however, must be much improved for 
developing a clinically acceptable gene therapy proto- 
col for haemophilia B. 

More recently, in vivo expression of factor IX by 
taking a route of ex vivo gene transfer with retrovirally 
transduced keratinocytes was reported. 145 The 
expression level was extremely low (*£ 1-2 ng/day/10 6 
cells) and lasted less than a week, suggesting a need for 
more improvement before this procedure can serve the 
purpose. Low level expressions of dog factor IX (— 6 
ng/ml plasma) was also observed in partially 
hepatectomized haemophilia B dogs after infusion of 
factor retroviral vector. 153 

Currently, none of the approaches under investi- 
gation is ready for clinical testing for haemophilia in 
the United States. Within the next 1-3 years one or 
more of these approaches may be highly optimized for 
efficacy and safety, and become feasible for clinical 
applications. 

Conclusion 

To date, over 600 abnormal factor IX genes have been 
studied for their molecular mechanisms. This extensive 
study of factor IX in recent years is largely due to its 
clinical importance, the availability of its complete, 
contiguous nucleotide sequence which was deter- 
mined in 1 985, and development of readily usable tech- 
nologies such as polymerase chain reactions. 
Furthermore, its multidomain structure with an amen- 
able size for various protein chemical and recombinant 
DNA manipulations has made factor IX an exciting 
model for studying structure-function relationships of 
complex proteins. 

With the enormous amount of data accumulated, 
new important directions of research on factor IX in 
the future appear to be regarding its in vivo role in the 
regulation of thrombosis and haemostasis, alternative 
therapy development including gene therapy and 
recombinant factor IX production for safer protein 
replacement therapy, and its regulation at the gene 
expression level. As one of the key factors in the blood 
coagulation cascade, factor IX will continue to serve as 
an invaluable modeljto provide fascinating insights into 
the intricate mechanism of blood coagulation and its 
regulation. 
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Exhibit C 

Missense Mutations and Evolutionary Conservation of Amino 
Acids: Evidence That Many of the Amino Acids in Factor IX 
Function as "Spacer" Elements 1 

Cynthia D. K. Bottema/ Rhett P. Ketterling,* Setsuko li,* Hong-Sup Yoon,* 
John A. Phillips lll,t and Steve S. Sommer* 

'Department of Biochemistry and Molecular Biology, Mayo Clinic Foundation, Rochester, MM; and tDtvision of Genetics, Department of 
Pediatrics, Vanderbilt University School of Medicine, Nashville 



Summary 

Wc report 31 point mutations in the factor IX gene and explore the relationship between the level of evolution- 
ary conservation of an amino acid and the probability of a mutation causing hemophilia B. From our total 
sample of 125 hemophiliacs and from those reported by others, we identify 95 independent missense muta- 
tions, 94 of which occur at amino acids that are evolutionarily conserved in the available mammalian factor 
IX sequences. The likelihood of a missense mutation causing hemophilia B depends on whether the residue 
is also conserved in the factor IX-rclatcd proteases: factor VII, factor X, and protein C. Most of the possible 
missense mutations in generically conserved residues (i.e., those conserved in factor IX and in all the related 
proteases) should cause disease. In contrast, missense mutations in factor IX-specific residues (i.e., those 
conserved in human, cow, dog, and mouse factor IX but not in the related proteases) are sixfold less likely 
to cause disease. Missense mutations at nonconserved residues are 33-fold less likely to cause disease. At 
least three models are compatible with these observations. A comparison of sequence alignments from four 
and nine species of factor IX and an examination of the missense mutations occurring at CpG residues 
suggest a model in which most residues fall on opposite ends of a spectrum. In about 40% of residues, virtually 
any missense mutation in a minority of the residues will cause disease, while virtually no missense mutations 
will cause disease in most of the remaining residues. Thus, many of the residues in factor IX are spacers; that 
is, the main chains are presumably necessary to keep other amino acid interactions in register, but the nature 
of the side chain is unimportant. 



Introduction 

1 actor IX is a coagulation serine protease zymogen 
with eight functional domains, including ( l 1 a signal 
peptide, (2) a pro-peptide which is necessary for the 
y-carboxylation of the mature protein, ;3; a gla do- 
main with 12 Y-carboxyglutamic gla residues which 
bind four to six molecules of calcium, [A) a short aro- 
matic amino acid stack, (5) a first epidermal growth 
factor domain which contains a high-affinity calcium- 
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binding site, (6) a second epidermal growth factor do- 
main ot unknown function, i~) an activation peptide 
which is removed during proteolysis by factors VII or 
XI, and (S) a catalytic domain which activates factor 
X reviewed in Fune and Furie 19SS). Factor IX, fac- 
tor VII, factor X, and protein C are closely related 
coagulation serine proteases that have the same eight 
functional domains and similar gene structure 'Furie 
and Furie 1988;. 

Since the factor IX gene is located on the X chromo- 
some, a mutation that disrupts function artects any 
male who receives that allele. Many mutations in the 
factor IX gene causing hemophilia B have been de- 
scribed (reviewed in Giannelli et al. 1990;. The muta- 
tion rate is dramatically enhanced at CpG dinucleo- 
tides (Koeberl et al. 1989; Green et al. 1 990; but not 
at any other dinucleotides 'Bottema et al. 1991). 
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Herein we present 31 new families with point muta- 
tions and analyze the relationship between evolution- 
ary conservation of amino acids and missense muta- 
tions which cause hemophilia B in humans. 

Methods 

Sequencing 

DNA was extracted from blood collected in ACD 
solution B or solution A as previously described (Gus- 
tafson et al. 1987). Regions of likely functional sig- 
nificance were sequenced by genomic amplification 
with franscript sequencing (GAWTS) (Stoflet et al. 
1988) as described by Sommer et al. ( 1 990). GAWTS 
is a method of direct sequencing that involves ( 1 ) PCR 
amplification of the segment of interest, where at least 
one of the PCR primers has an attached phage pro- 
moter sequence, (2) transcription of the amplified seg- 
ment with the phage RNA polymerase to produce a 
single-stranded RNA molecule, and (3) sequencing of 
the RNA template with reverse transcriptase. The fol- 
lowing bases were sequenced {numbering system cor- 
responds to that of Yoshitake et al. [1985]): region A, 
- 106 to 139; region B/C, 6720 to 6265; region D, 
10544 to 10315; region E, 17S47to 17601 ; region F, 
20577 to 20334; region G, 30183 to 29978; and re- 
gion Fi, 31411 to 30764. The poly A addition region 
was not sequenced. The order of the numbers in each 
region indicates the direction of sequencing. In all, at 
least 2.2 kb was sequenced from each hemophiliac. 

Hapiotype Analysis 

The following polymorphisms in the factor IX gene 
were examined: Hinfl (intron a^ (Winship et al. 1 984), 
Xm?d (intron c) (Winship er ai. 1984}, 7 " t ^/I f intron d) 
(Camerinoetal. 1984), and Hha\ -;3' of gene ) ( Winship 
et al. 1989,. From these tour polymorphisms, eight 
common haplorypes were defined, with frequencies of 
2%-19% in the normal Caucasian population (Ket- 
terling et al., in press;. 

DNA segments containing the Taq\ and the Xmn\ 
restriction sues were amplified by PCR and digested 
with the appropriate restriction enzyme as described 
elsewhere (Kocberl et ah 1990). The products were 
gel electrophoresed, and the presence ( + ) or the ab- 
sence ( - ) of the restriction site was determined, For 
the Hinfl (also known as Ddel) polymorphism, the 
DNA was amplified by PCR, and the presence ( + ) or 
absence ( - ) of the 50-bp insert was determined by gel 
electrophoresis. The hihal polymorphism was deter- 



mined by amplifying 500 ng genomic DNA by PCR 
using 0. 1 u.M of the previously described oligonucleo- 
tides HI and H2 (Winship et al. 1989) and 1.5 mM 
MgCh in 50-ul reactions. The PCR products were 
digesred with Hha\, and the presence ( + ) or absence 
( - ) of the restriction site was determined by gel elec- 
trophoresis. 

Levels of Amino Acid Conservation in Factor IX 

Four classes of residues can be defined from the avail- 
able factor IX sequences and from the sequences of 
both human and bovine factor VII, factor X, and pro- 
tein C (fig. 1 and Appendixes A and B). A residue is 
"generic" if it is identical in all species of factor IX and 
is also identical in the three related blood coagulation 
serine proteases: factor VII, factor X, and protein C. 
The residue is "factor IX specific" if it is identical in 
all species of factor IX but not identical in any of 
the three related proteases. The residue is "partially 
generic' 1 if it is identical in all species of factor IX and 
is identical in one or two of the three related proteases. 
If a residue is conservatively substituted in the species 
of factor IX (i.e., is S/T, S/A, Y/F, R/K, I/L/V, 
N/D,D/E, Q/N, and E/Q), the above definitions are 
modified to also allow the conservative substitution in 
the three related proteases. If a residue is nonconserva- 
tively substituted in any of the species of factor IX, it 
is classified as "nonconscrved. v 

The above definitions differ from a previous classi- 
fication (Koeberl et al. 1990; Sarkar et al. 1990) in 
that the sequence alignment utilizes the complete se- 
quence of dog factor IX ( Evans et al . , 1 98 9 ) and mouse 
factor IX (Wu et al. 1990) and bovine factor VII (Ta- 
kcya et ah 198S) (Appendixes A and B). Most im- 
portant, the presently defined factor IX-specific resi- 
dues and most of the partially generic residues were 
previously combined into one class. 

A detailed protocol was used for assigning residues 
to each class (see Identity subsection). Our conclusions 
will remain the same despite certain revisions in the 
classification protocol (see Conservative substitutions 
subsection;. 

I . Identity. — 1 hose amino acids identical in the mam- 
malian factor IX sequences were compared with the 
homologous residues in the related coagulation serine 
proteases. The amino acid was assigned to a class on 
the basis of extent of identity with these proteases. For 
a residue to be considered conserved in a given related 
protease, both the human and bovine residues needed 
to be identical with the corresponding factor IX amino 
acid. 
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2. Conservative substitutions — Each amino acid which 
was not identical in the factor IX sequences was con- 
sidered to be nonconserved unless the substitutions 
were highly conservative. These highly conservative 
substitutions were S/T, S/A, E/D, IV N, Q/E, 
Q/N\ F/Y,K/R, and I/L/V (see fig. 1 legend for the 
single-letter amino acid code). Conservative substitu- 
tions were defined rather stringently in that, with one 
exception (I/L/V), only two residues constitute each 
conservation group. As examples, D (aspartate) and N 
(asparagine) are in one group because they are related 
polar residues of approximately the same volume, D 
and E (glutamate) are in another group because they 
both have a negative charge. Thus, at any given D, 
the conservative substitutions were limited to either 
charge or size. Therefore, the presence of D and E in 
the factor IX sequences is classified as a conservative 
substitution. While the presence of D, E, and N is 
classified as nonconservative. (The alternative possi- 
bility of allowing any combination of either (a) D, 
E, Q, and N or [b) S, T, and A converts only four 
nonconserved residues to the conserved class and does 
not alter any of the conclusions). The conservatively 
substituted factor IX amino acids were then compared 
with the related serine proteases and were assigned to 
one of the above classes (i.e., generic, partially ge- 
neric, or factor IX specific). 

In practice, almost all of the generic, partially ge- 
neric, and factor IX-specific residues are identical 
rather than conservatively substituted. Conservative 
substitutions in factor IX occur in only 8% of the 364 
conserved amino acids. If conservative substitutions 
are not allowed, the number of residues involved in 
hemophilia B remains virtually unchanged ( 1 82 amino 
acids without substitutions vs. 176 amino acids with 
conservative substitutions [table 31). However, the 
number of mutations in nonconserved residues in- 
creases from one to seven if conservative substitutions 
are not allowed. Mutations were observed at one 
N 7/ D site, four I/I./V sites, one TVS conservative site 
: see fig. 1 ). 

3. Mutations at CpG versus non-CpG sues. — Since muta- 
tions at the dinucleonde CpG occur more frequently 
than those at other sites (Koeberl et al. 1989; Green 
et al. 1990), the mutations at CpG and non-CpG nu- 
cleotides were analyzed separately (see tables 3 and 5 ;. 
Residues at CpG dinuclcotides were assigned to the 
non-CpG or CpG categories. This assignment was 
based on the fraction of possible mutations at CpG 
and non-CpG nucleotides in each residue (table 2). As 
examples, all of the arginine residues with codons of 



CGX were assigned to the CpG d [nucleotide group. 
However, a glycine residue preceded by a residue end- 
ing in C (XXC GGX) had the first G assigned to the 
CpG group and the second G assigned to the non-CpG 
group. Although one-third of all independent muta- 
tions occur at CpG, the rarity of this dinucleotide in 
the factor IX coding sequence stipulates that the CpG 
group accounts for less than 3% of all the possible 
kinds of missense mutations. 



Results 

Mutations 

Point mutations were delineated in 3 1 families with 
hemophilia B by direct genomic sequencing of the re- 
gions of likely functional significance, which include 
the coding region, the splice junctions, the putative 
promoter, the 5' untranslated region, and a small part 
of the 3' untranslated region (Koeberl et al. 1989). 
In total, 66 kb of sequence were obtained. While 
the majority of the hemophiliacs are Caucasians of 
northern-European descent, two (HB101 and HB102) 
are Hispanic and one (HB109) is Japanese. 

Of the 31 point mutations, two affect splice junc- 
tions and six produce nonsense mutations, but the 
great majority (23) are missense mutations (table 1). 
Only one sequence change was found m each individ- 
ual. The splice junction mutations which disrupt 
known consensus sequence, as well as the nonsense 
mutations which result in truncated protein products, 
are clearly causative mutations. In addition, we con- 
clude that the missense mutations are also all causative 
because (1 ) they are the only sequence change found 
in the regions of likely functional significance, (2) poly- 
morphisms in these regions are rare (Koeberl et ah 
1 989;, (3) these changes are not present in normal 
individuals or as second site changes in other hemo- 
philiacs, and \4j the missense mutations (except for 
one) are all at evolutionarilv conserved residues (fig. 1 ). 

Twenty-two of these mutations have not been pre- 
viously described. Two mutations (serine 4 " and h- 
sine" ;: ; occur at residues specifically conserved in fac- 
tor IX but not in the related proteases (fig. 1). Thus, 
these mutated residues may be important for factor 
IX-specific interactions such as binding to factors VIII 
or VII. Asparagine- 14 "— isoleucine (HB108) is the first 
reported missense mutation in a hemophiliac to occur 
in a nonconserved residue. Tryptophan 4 ' 1 —arginine 
represents the first non-CpG site at which two patients 
(HB20 and HB92) have the same mutation in a differ- 
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Figure I Factor IX missense mutations and amino acid conservation. Both the conservation of the ammo acid resicues in factor IX 
and the location of m.ssense mutations are shown. • = M.ssense murat.on at a given residue from our sample; C = m,ssense mutation 
■ t , g.ven residue reported bv others (Cinneli. et al. 199(1 . Multiple symbols <• or O, indicate the number of known independent 
misscnse-mtitation changes at a g.ven residue. For CpG d.nucleot.des. svmbois are as follows: A = conserved residues .r. wn.cn transitions 
.,nd transversions w,ll cause a uonconservauve m.ssense suhsut,ir,on ; ,\ = conserved residues where only transvers.or.s «ill cause a 
nor.conserv.uive m.ssense substitution. A = M.ssense mutation no, causing disease (Montandon et al. 1990;. The geometric snapes indicate 
,he degree of conservation of an am. no acid residue, as determined from factor IX and related serine protease alignments Appendixes A 
and B: Circles represent "generic " residues, squares represent -factor IX-specilic" residues, and pentagons represent ' partial, v generic 
residues asdefined in Methods. An asterisk '. * Hns.de the geometr.c shape indicates that conservative substitutions occurred in the mammalian 
species of factor IX. The alignment of factor IX is based on the available mammalian spec.es of factor IX and on both the human and box me 
sequences of (actor X, factor VII (Takeya e, al. 1938). and protein C IKoebcrl et al. 1990; Sarkar et al. 1990) (see Appendixes A and B,. 
,\ Alignment of N-tcrmmal segment of factor IX, based on four available mammalian species: human (Yoshitake e, a.. 19X.V cow 
•Kat.vama e. al 19791, dog (Evans er al. 1989). and mouse :\X'u et al. 1990 . The translation start site is based on additional data lrom 
rat and macaque sequences (Pang et al. 1990;. B, Alignment of C-term,nal segment of factor IX (activation peptide and catalytic do ""'";- 
based on nine mammalian species: human, sheep, pig. rabbi:, guinea pig, rat, mouse, cow, and dog (Evans et al. 1989; Sarkar ct al. mi ,. 
The single-lctter code for ammo acids is as follows: A = Ala; R = Arg; N = Asn; D = Asp; C = Cys; Q = Gin; E - Clu; O - Uy; 
li = H.s; I = lie; I. = Leu; K = Lys; M = Met; F = Phe; P = Pro; S = Ser; T = Thr; W = Trp; Y = Tyr; and \ = Val. Stars 
! ★ j indicate the serine protease catalytic triad ammo acids. Note that the classification in table .3 is based on factor IX from only four species 
and rh.it only nnn-CpG residues were tabulated. 




ent haplotvpe, indicating that these two mutations are 
due to independent events. 

Missense Mutations versus Extent of Evolutionary 
Conservation 

To determine the relationship between missense 
mutations in hemophiliacs and the amino acid conser- 
vation classes, haplotvpe analysis was used to deter- 
mine whether recurrent mutations were independent 
events. Mutations at non-CpG dinucleotides were 
considered separately from those at CpG dinucleo- 
tides, since mutations at CpG dinucleotides are greatly 
enhanced. Although the CpG category represents less 
than 3% of all the missense mutations that are possi- 
ble, the analysis of this category is important (see 
below). 



At least 38 independent missense mutations were 
found at non-CpG residues (table 1; Bottema et al. 
1990j; Koeberl et al. 1990). We combined these data 
with data on 43 additional non-CpG mutations re- 
ported in the literature (reviewed in Giannelli et al. 
1990;. Since haplotvpe data are not generally avail- 
able on patients reported by others, it was not possible 
to identify recurrent mutations that are due to inde- 
pendent mutational events. However, in our sample, 
extensive haplotvpe analysis strongly suggests that 
only one of the 25 recurrent mutations at non-CpG 
dinucleotides is independent (Bottema et al. 1989, 
1990b; Ketterlmg et al. 1991, and in press). Thus, it 
seems reasonable to assume that all recurrent muta- 
tions reported in the literature arc the result of com- 
mon ancestors. Common ancestry has been well docu- 
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mcnted for the most frequently recurring non-CpG 
mutation: isolcucine ,9/ ~**threonine (Bottema et al. 
1990/?; Thompson et al. 1990). 

Four classes of factor IX residues can be defined on 
the basis of the extent of evolutionary conservation. 
These classes arc (1) generic residues that are con- 
served in factor IX and in the related coagulation ser- 
ine proteases factor VII, factor X, and protein C, (2) 
partially generic residues that are conserved in one or 
two of the related coagulation proteases, (3) factor 
IX-specific residues that are conserved in factor IX 
but in none of the related coagulation proteases, and 
(4) nonconserved residues (see Methods). 

The amino acids in factor IX are distributed be- 
tween the four classes in a nonrandom manner (table 
2). Almost one-half of the generic residues are either 
cysteine, glycine, or glutamate. The cysteine residues 
are involved in disulfide bonds (fig. 1), and most of 
these glutamate residues arc modified to y-carbox yglu- 
tamic acids necessary for calcium binding (Furie and 
Furie 1988) (table 2). A substantial fraction of the 
charged generic residues (10 glutamates and three 
aspartates) are involved in the chelation of calcium 
(Rees et al. 1988). If calcium binding is ignored, 
charged residues are substantially underrepresented in 
the generic and partially generic classes. In contrast, 
codons having a high G + C content arc substantially 
overrepresented (Bottema et al. 1991). If the generic 
and partially generic residues are combined, cysteine, 
glvcine, and tryptophan are significantly overrepre- 
sented while threonine, asparagine, and lysine are un- 
derrepresented 'tor y- values, see table 2 V footnotes b 
and c . 

Both the classification of each residue in factor IX 
and the location of all the rnissense mutations ob- 
served m patients with hemophilia B were determined 
. ng. I . The non-CpG rnissense mutations arc distrib- 
ute throughout the ractor IX protein. However, mis- 
sense mutations causing hemophilia B are most likelv 
to occur at generic residues (table 3). Mutations at the 
partially generic residues are about twofold less hkeK 
to produce hemophilia B; and mutations at the factor 
IX-specific residues are sixfold less likely to produce 
hemophilia B (i.e., the likelihood that they will pro- 
duce the disease is only 15% of that for generic resi- 
dues (table 3). Mutations at nonconserved residues 
are 33-fold less likely to produce hemophilia B, indi- 
cating that these residues are almost never involved in 
this disease. 

Three models for these observations can be envi- 
sioned. These models will be stated in the context of 
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possible explanations for the low frequency of caus- 
ative rnissense mutations at factor IX-specific resi- 
dues: 

1. At 1 5% of the factor IX-specific residues, most^ if 
not all , of the possible amino acid substitutions will 
cause disease. The remaining 85% of the factor 
IX-specific residues are not essential. In this case, 
factor IX sequence from additional tionnianima- 
liau species should allow sufficient evolutionary 
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time for most nonessential residues to be substi- 
tuted, while the essential 15% of functionally im- 
portant residues will remain conserved. 

2. Mutations at 100% of the factor IX-spccific resi- 
dues can cause disease, but, on average, at any 
given site, only 15% of the 19 possible amino acid 
substitutions will cause disease, and 85% of the 
substitutions will not cause disease. In this case, 
the number of factor IX-specific residues will ap- 
proach zero as progressively more factor IX se- 
quences are added. Thus, the additional factor IX 
sequences will increase the fraction of causative 
mutations occurring at nonconserved residues. 

3. Misscnse mutations at 15% of the factor IX-spe- 
cific residues cause hemophilia B, and the re- 
maining 85% cause an as yet undefined disease 
such as a hypercoagulability that might perhaps be 
lethal prenatally. In this case, the 141 factor IX- 
specific residues should remain conserved if other 
factor IX sequences are added. 

To help distinguish between these possibilities, se- 
quence data were analyzed from the C-termmal seg- 
ment (the activation and catalytic domain) of an addi- 
tional five species of factor IX (Sarkar et al. 1 990). An 



alignment of all nine species (C-tcrminal 9) versus the 
alignment of four species (C-terminal 4) indicates that 
27 residues (13%) are now reclassified as noncon- 
served (table 4A). None of these reclassified residues 
were generic in the initial C-terminal 4 alignment. In 
contrast to the prediction of model 3, 12% of the 
partially generic and 21% of the factor IX-specific 
residues were reclassified as nonconserved (table 4B). 
More important, no missense mutations have been 
reported to cause hemophilia B in any of the 27 reclas- 
sified nonconserved residues. If there had been a cor- 
responding decline in the number of missense muta- 
tions at conserved residues, as predicted by model 2, 
an additional 3.5 mutations would be expected at non- 
conserved residues (table 4B). However, none were 
observed (P < .002). While the data best support the 
predictions of model 1 , more cross-species sequences 
and a larger sample of mutations are necessary to elim- 
inate a hybrid model that contains some contribution 
from model 2 and /or model 3. 

The pattern of transitions at CpG dinucleotides fur- 
ther supports model 1 . Since the mutation rate in fac- 
tor IX is much higher at CpG dinucleotides than at 
non-CpG dinucleotides (Koeberl et al. 1 989; Green et 
al. 1 990), the mammalian factor IX sequences at CpG 



Table 4 

Amino Acid Conservation and Missense Mutations in Activation and Catalytic Domains of Factor IX at 
Non-CpG Residues in C-terminal 4 versus C-terminal 9 



A. Amino Acid Conservation m C-terminal 4 vs. C-terminal 9 



Total Conslrvld Total Nonconservld 

v. -terminal -4 C-termmal 9 C-terminal 4 C-terminal 9 

No. of residues 2 IS 191 54 81 

No. of missense mutations observed 4^ 4S 1 1 



B. Conserved C-terminai - : Ammo Acids Converted to Nonconserved in C-terminal 9 

Partially Factor IX 
Generic Generic Specific Total 

No. ot residues converted' 0 I () 1~ 2~ 

Residues converted in each C-cenmnal 4 conserved clas- 0% 12% 21% 12% 

No. ot observed missense mutations expected uo become C-rermina! 9 r:onconser\ed" <! 4.1 2.6 6% 

Notl. — Conservation is defined as in Methods. Conservation or the carboxy segment of factor IX was determined on the basis of an 
alignment of either C-termin.il 4 or C-term:na! 9. Both alignments utilize sequence from human, mouse, cow, and dog. The C-termina! 
9 alignment also includes sequence from sheep, pig. rat, gemea pig, and rabbit {Sarkar et al. 1990) (see Appendix A". Similar alignments 
have been published elsewhere (Sarkar et al. 1990; Wi: er al. 1990 . 

Residues conserved in the C-terminal 4 alignment rru: convert to nonconserved residues in the C-terminal 9 alignment. 

" Number of observed missense mutations expected, on the basis of model 2, to become C-terminal 9 nonconserved; this number is 
calculated by multiplying the percent of C-terminal 4 conserved residues that convert to nonconserved in the C-terminal 9 alignment by 
the relative probability of missense mutation causing hemophilia B {table 3;. For partially generic residues, 10 x .41 = 4. 1 , and for factor 
IX-spccihc residues 17 x ,15 = 2.6. 



Missense Mutations in Factor IX 

dinucleotides should rapidly mutate at nonconserved 
amino acids. Therefore, the pattern observed in mam- 
malian sequences at CpG dinucleotides should be 
analogous to the pattern of non-CpG conserved dinu- 
cleotides that would be observed in more diverged 
species. Transitions at the 15 conserved CpG nucleo- 
tides should cause a missense mutation resulting in 
disease (table 5). Transitions at 12 of the 1 5 possible 
CpG sites have been observed. These transitions have 
occurred in three of four generic residue sites, in five 
of six partially generic residue sites, and, most im- 
portant, in four of five factor IX-specific residue sites. 
Thus, there is an almost perfect correlation between 
evolutionary conservation and missense mutations 
causing hemophilia B. Furthermore, there are four ar- 
ginine residues (codons CGX, where X is any base 
except A) in which transitions at either C or G will 
produce a missense mutation. Mutations at all six evo- 
lutionarily conserved sites have been observed to cause 
hemophilia B, while neither of the two possible transi- 
tions have been observed in the one nonconserved 
CGX arginine residue (R 403 ). 

Percent of Missense Mutations That Cause Hemophilia 8 

Factor IX, factor X, factor VII, and protein C di- 
verged about 450-500 million years ago (Doolittle 
and Feng 1987). If a residue is identical in these prote- 
ases despite such a long period of evolutionary time, 
it is very likely to be absolutely essential for protein 

Table 5 



829 

function. The essential nature of such generic residues 
(102 total) is supported by an analysis of the residues 
conserved in the C-terminal 4 of factor IX versus those 
in the C-terminal 9 of factor IX (table 4B). Generic 
residues are absent from the group of 27 amino acids 
that, as a result of the C-terminal 9 alignment, have 
been reclassified as nonconserved (table 4B). Addi- 
tional support comes from an alignment of human 
and bovine factor VII, factor X, and protein C. One 
hundred six residues are identical in these proteins. If 
the four species of factor IX are added to the align- 
ment, virtually all (96%) of these generic residues re- 
main identical, despite an additional 450+ million 
years of evolutionary divergence. Thus, we conclude 
that most, if not all, possible missense mutations at 
generic residues will cause disease. From the relative 
frequencies of mutations in each class, it can be esti- 
mated that only 40% of all possible missense changes 
in factor IX will cause hemophilia B (table 3). In the 
context of model 1, the estimate implies that 40% of 
factor IX residues are important for function and that 
most, if not all, missense mutations in these residues 
will cause disease. 

Discussion 

Prediction of Missense Mutations That Cause Hemophilia 

We have analyzed [a) amino acid changes that dis- 
rupt factor IX function in hemophiliacs and (b) amino 



Missense Changes in Factor IX That Are Due to Transitions at CpG Dinucleotides, as Function of Amino 
Acid Conservation 



Amino Acid Conservation Corresponding to 
C or G Nucleotide at CpG' 

Partially Factor IX Total 

Generic Generic Specific Conserved Nonconserved Total 

A. No. of C or G nucleotides 

at CpG' f-. 5 15 6 21 

B. No. ot our independent 

missense mutations 1 - 2 " 5 I -t 0 1-1 

C. No. of sites at which missense 
transitions have been 

reponed J 3 5 4 12 0 12 

J At cerrain arginine residues (CGX;, a transition at either C or G will cause a missense mutation, while in other cases transitions at only 
G will cause a missense mutation. Thus, each transition which causes a missense mutation is counted separately. 
h Nucleotides in which transitions result m missense mutations. 

L " Data include our independent transitions resulting in missense mutations at CpG dinucleotides of factor IX. Multiple mutations at the 
same site were judged as independent only if the haplotype differed or if a germ line of origin could be determined (table 1; Bottema et 
al. \990a; Kocbcrl ec al. 1990). 

Data are from table 1 and from Bottema et al. (1990a), Giannelli et ai. {1990:, and Koeberf et al. (1990). 
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Bottema ct al. (I) 



acid changes that are compatible with normal factor 
IX function in different species. Four classes of factor 
IX amino acids were defined on the basis of the extent 
of evolutionary conservation. We document the func- 
tional importance of most, if not all, of the generically 
conserved residues. However, residues uniquely con- 
served in factor IX are sixfold less likely to give rise to 
hemophilia B. Thus, many mutations at factor IX- 
specific residues should be neutral variants. One such 
neutral variant in a factor IX-specific residue has re- 
cently been discovered: histidine 25 '' is nonconserva- 
tively changed to tyrosine without causing hemophilia 
B in a male (Montandon et al. 1990). 

Both (1) the relationship between missense muta- 
tion and amino acid conservation class as defined by 
the carboxy segment of factor IX in C-terminal 4 ver- 
sus that in C-tcrminal 9 and (2) an analysis of missense 
mutations at CpG dinucleotides suggest that about 
40% of the residues in factor IX are crucial for func- 
tion. Most of the possible missense mutations in the 
remaining 60% of the residues will not cause hemo- 
philia B; these remaining residues are likely to be 
"spacers," i.e., residues which maintain the position 
of critical amino acids but whose own side chains are 
not crucial for function (Doolittle and Blomback 
1 964), Many of these spacer residues are classified as 
factor IX specific because the evolutionary time sepa- 
rating the mammalian factor IX sequences is insuffi- 
cient to have changed many nonessential residues. The 
conclusions predict that nonmammalian factor IX se- 
quences should convert many of these nonessential 
residues to the nonconserved class and substantially 
increase the likelihood that a mutation in the re- 
maining factor IX-specific residues will cause disease. 

Saturation in vitro mutagenesis of selected residues 
1:1 factor IX and expression in tissue culture could help 
confirm that factor IX function will be significantly 
compromised by the amino acid substitutions found 
;n hemophiliacs. However, the generation, confirma- 
tion, and characterization of such a large number nf 
mutations would require many years of effort. In addi- 
tion, the interpretation of data indicating that a muta- 
tion retains functional integrity is confounded by the 
simplicity of a cell culture system in comparison with 
the intact organism (i.e., mutants that score as func- 
tional in cell culture could still cause hemophilia B in 
humans;. It would be preferable to generate a barrerv 
of transgenic mice with hemophilia B, but both the 
jbsL-iict.* of ,1 mous,c mfjclrl for hemophilia B and the 
difficulties in generating a large number of transgenic 
mice with independent mutations pose major technical 



challenges. We conclude that sequencing factor IX in 
more nonmammalian species and delineating the mu- 
tations in a larger sample of hemophiliacs is currently 
the best way to determine which missense mutations 
will result in disease. 

The generality of the present findings can be as- 
sessed by examining the correlation between evolu- 
tionary conservation and mutations causing other se- 
vere X-linked diseases. Hemophilia A would be a good 
choice, because factor VIII belongs to a different gene 
family and more than 100 missense mutations will 
soon be available. Autosomal genes such as a- and 
p-globin are not good candidates for assessing the rela- 
tionship between evolutionary conservation and mis- 
sense mutations that disrupt function. In the cases 
of a- and p-globin, the marked overrepresentation 
of dominant mutations, heterozygote advantage, 
founder effect, and the biased methods of patient as- 
certainment pose major problems in the interpretation 
of the data. As an example of the problems, the aggre- 
gate mutational data in globin erroneously indicated 
that CpG was not a hot spot of mutation (Vogel and 
Motulsky 1986). 

Mutant Analyses in Other Genes 

The data, albeit meager, from saturation in vitro 
mutagenesis in other systems is compatible with the 
notion that, if one missense mutation at a residue dis- 
rupts function, then the other possible missense muta- 
tions are also very likely to disrupt function. In Esche- 
richia coli, saturation mutagenesis of evolutionary 
conserved residues in the region of the p-lactamase 
active site revealed that 14 of the 19 possible ammo 
acid substitutions retained appreciable activity toward 
the penicillins (Schultz and Richards 1 986). However, 
in all but two of the substitutions the limited character- 
ization performed was sufficient to reveal major reduc- 
tions in catalytic specificity and/or thermal stability, 
strongly suggesting that all these mutants would be 
at a selective disadvantage in vivo. Moreover, in a 
follow-up study using saturation mutagenesis at five 
codons, partial activity could commonly be found, but 
no mutant protein had the catalytic specificity and 
thermal stability of a wild-type protein (Dube and 
I.oeb 1989). Finally, in NTH 3T3 cells, a study of 
substitutions at the conserv ed glycine 12 of the Harvey 
ras protein indicated that IS of 1 9 amino acid substitu- 
tions produced a transformed phenotype (Seeburg ct 
al. 19S4}. 

Analv sis of the N-terminal segment of the lambda 
repressor by cassette mutagenesis indicates both some 
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residues in which only the wild- type side chain is ac- 
ceptable and other residues in which either a few or 
many substitutions are acceptable (Reidhaar-OIson 
and Sauer 1988; Lim and Sauer 1989). However, in- 
terpretation of the data is complicated by ( 1 ) the gener- 
ation of multiple potentially compensatory mutations 
by cassette mutagenesis, (2; the biases associated with 
a mutation method which relies on equal pairing of 
inosine with all bases, (3) the use of only the N-termi- 
nal fragment of the repressor, and (4) the use of a 
selection or screening scheme without knowledge of 
how that translates into the fitness of the viral protein 
in its ecosystem. 

Caveats 

Multiple base substitutions at a single codon are 
rare, and no such missense mutations have yet been 
reported in the factor IX gene. If the present pattern 
is representative of the past, a residue will be conserved 
through evolution if the five to eight possible single- 
base missense changes all cause disease. It is conceiv- 
able that disease will not be caused by missense 
changes that arise from multiple base substitutions at 
a codon. However, this seems unlikely because 94% 
of the generic residue sites do not tolerate even highly 
conservative substitutions which usually involve a 
single-base change (see Methods and fig. 1). There- 
fore, the more drastic missense changes that com- 
monly result from substitutions at two and three bases 
are unlikely to be tolerated. 

A second caveat concerns the virtual certainty that 
at least a small fraction of residues may fit model 2 and 
perhaps model 3 . Such residues will limit the extent to 
which evolutionary conservation can predict which 
mutations will cause hemophilia B in humans. Both 
identification of more missense mutations and addi- 
tional sequencing of factor IX from nonmammalian 
species should ultimately allow an estimate of the frac- 
tion or residues fitting models 2 and 3. 

Possible implications for Clinical Research 

Tht development or rapid PCR-based methods for 
direct sequencing and screening assures that manv 



protein sequence variants will be detected by the anal- 
ysis of DNA. Some of these variants will be found in 
individuals who also carry a normal allele. How does 
one assess the likelihood of the change being neutral, 
as opposed to a change that cither predisposes to a 
multifactorial disease in hetero/ygotes or causes a re- 
cessively inherited disease in homozygotes? Given the 
expense and effort of clinical studies, it would be useful 
to have criteria for estimating the likelihood that a 
missense mutation observed in a heterozygote will 
produce a dysfunctional protein. If further data were 
to show that the present observations are generally- 
true, the level of evolutionary conservation might pro- 
vide such criteria. 
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Alignment of N-Terminal Segment of Factor IX 



*Intron 1 
-30v -20v -lOv +lv 

Human f^PGLmCLLGYLLSAECTVFLDHENANKILNRPKRYNSG 

Cow 

Mouse RA F T..A R...T...T 

Dog ...AS A R...T...S 

Rat ..DAP.. 

Macaque 



*Intron 2*Intron 3 
10v 20v 30v 40v 50v 

Human KLEEFVQGNLERECMEEKCSFEEAREVFENTERTTEFWKQYVDGDQCESN 




























*Intron 4 

60v 70v 80v 90v lOOv 

Human PCLNGGSCKDDJNSYECWCPFGFEGKNCELDVKNIKNGRCEQFCKNSAD 

Cow M QA....T A..S K....RDT. 

Mouse I S QV....R A K P. 

Dog ....D.V RA K....LGP. 



*Intron 5 
HOv 120v 130v 

Human NKVVCSCTEGYRLAENQKSCEPAVPFPCG 

Cow D D 

Mouse ...I Q...D T 

Dog T..Q...D.R 

Figure A I Ammo acl sequences of (actor I.\ from nurr.m.U,.m s P eoes. ahuned Irom P ubl,sheJ sequence.. The full-length protems 

were .i!:aned from human ! Yoslwake ct al. !**<.. cm katav,--,t et a! ! -9 . do, ■ t vans et a: 1 9« . and .mouse et a, 

the activation and cauivtic domain,, additional sequence from stiecp. stairtca pi,, rat. pig. and rat-:: SarKar et a.. 1 " "J- 

,he Ji.aivscs that compared nine with lour spec.es table 4 . The trans!.,:, -n start site ,s based on actional data Irom rat arte macaque 

.Taut; el a;. 1490,. The factor lX-specihc residues are underlined. 



Human 
Sheep 
Pig 

Rabbit 

Guinea Pig 

Rat 

Mouse 

Cow 

Dog 



140v * 150v 160v 170v 

RVSVSQTSK-LTR AEAVFPDVDYVNSTEA ET1LDNITQST 

• A..LH...K TI.SNMN.E..S.. .I.W..V...N 

. .HSPTT II.SNM..E....V .P...SL.E N 

....HA..KI.. '.TTI.SNTE.E.F... ...RG.V..RS 

. .IPSV. .EHN. .N. I.SRMG. . .F.DDETIWDDNDDD. . .W. .S.E. . 

. . .AYN. . KI . . . .T. .SNT. .G. . . . L--ILDDITN-S L. ENS 

A.I.YS..KI. . ..T..SNM..E VFIQDDITD-GA. .N.V.E.S 

HI..K TI.SNTN.E..S.. .I.W..V. N 

...PHI.MTR.. ..TL.SNM..E V .K V. 



Human 
Sheep 
Pig 

Rabbit 

Guinea Pig 

Rat 

Mouse 

Cow 

Dog 



180v< 



190v 



*Intron 6 
200v 



210v 



220v 



...D..N. .. 










. EIA. . 








..SD..I. I. 


















..SD I 


. . .N. 
















KPSDE.F. .. 










ETE. , 








EPI 










EIE. . 








E.L 










EIE. . . 


. .A. 


T . . 






• • * • 


ER. ... 


.. .L. 


H. 


EIA. . . 








-PL... 



















*Intron 7 

230v 240v 250v 260v 270v 

Huma " VETGVKITVVAGEHNIEETEHTEQKRNVIRI IPHHNYNAAINKYNHDIAL 

Sheep IKP T.KP.P A. .Y.G. . .S. . . .S 

Pig LP Y.T....P...R A....S...TV...S 

Rabbit IKPDDN Y..Q...N Y.K...T 

Guinea Pig ILP.I..E K KK.D. . .R. . .TQ. . L. .S. . .SF.' ! '.S. '. '. '. '. 

Rat LKP.D..E DEK.D...R T. . . .Q. . .T. . . .S 

Mouse LKP.D. .E Y. .DKK.D. . .R T Q.. .T S 

Cow IKP T.KP.P A..Y.S...S S 

Dog I.PD I T.KR T.L..S...T 



280v 290v 300v 310v 320v 

Human LELDEPLVL^VIPICIADKEnMfLKFGSGYVSGWGRVFHKGRSALV 

Snee P E R Y NR. ...SI 

Pig T NR TI 

Rabbit ....K..T NR. ..... '.H. ........'.['. !nr! !q!si 

Guinea Pig ....K..S NR A KL.SQ..T.SI 

R at ....K..I V.N K..N...Q.SI 

Mouse ....K..I V.NR K..N...Q.SI 

Cow E RD S...Y K..NR....SI 

Do 9 T R..S N SI 
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Appendix A (continued) 



330v 340v 350v 360v 370v 

Human LQYLRVPLVDRATCLRSTKFTIYNNMFCAGFHEGGRDSCQGDSGGPHVTE 

Sheep K H Y K 

Pig K V...S K...L 

Rabbit F DV..K...E 

Guinea Pig 

Rat S YR...K...E 

Mouse T YR...K...E 

Cow K S..SH Y K 

Dog K K 



380v 390v 400v 410v 

Human VEGTSFLTGI ISWGEECAMKGKYGIYTKVSRYVNWI KEKT KLT 

Sheep 

Pig V 

Rabbit I V..R..W 

Guinea Pig . . . .N 

Rat 

Mouse 

Cow 

Dog ... I 
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Appendix B 



Alignment of Related Coagulation Serine Proteases 



Human 

Bovine 

Human 

Bovine 

Human 

Bovine 

Human 

Bovine 



Factor IX 

Factor IX 
Factor X 

Factor X 
Factor VII 

Factor VII 
Protein C 

Protein C 



■39 * Intron 1 

MAE SPGLITICLL GYLLSAECTV FLDHENANKI LNRPKRYNSG 



MGRPL HLV.LSAS.A 
MAGLL HLV.LSTA.G 
MV .QA.RLL... 



.L..LG.-SL 
.L.RPAG-S. 
LG.QGCLAA. 



MWQLTS LLLFVATWGI SGTPAPLDS. 
TS LLLFVT.WGI SSTPAPPDS. 



.IRR.Q..N. 

. .PROQ.HRV 
.VTQ.E.HGV 

.SSS.R.HQV 
.SSS.R.HQV 



.A.VT.A. .- 
.Q.AR.A..- 
.H.RR.A.A- 
A.-. 
.RIR..A..- 
. RIR. .A. .- 



H. 
B. 
H. 
B. 
H. 
B. 
H. 
B. 



Factor IX 
Factor IX 
Factor X 
Factor X 
Factor VII 
Factor VII 
Protein C 
Protein C 



iyi „ *Intron 2*Intron 

KLEEFVQGNL ERECMEEKCS FEEAREVFEN TERT TEFWKQ YV DGDQC - 

R k j<7 — .. 7.7^77- 

•MKK.H T.. Y D SDK. N...NK 

•VK L..A.. L D A.Q. D...SK 

•LRP.S K..Q I.KD A... KL. .IS 

•LRP.S R..L H.I.R. E... RQ..VS 



. LRHSS . 
. LRP. .V 



.1. 
.S. 



. K. 
. K. 
.S. 
.N. 



-ESN 

— !ts 

.GH 

A.S 

A.S 



VDD . 
.D. 



LA. .SK H LVLPLEH 

MA. .SK .S EDRPSGS 



55 

H. Factor IX PCL NGGSCKD DINSYECWCP 

B. Factor IX . . . . . .M Q 

H. Factor X ..Q .Q.K... GLGE.T.T.L 

B. Factor X . . . .Q.H. . . G.GD.T.T.A 

H. Factor VII ..Q QLQ..I.F.L 

B. Factor VII . .Q E. QLR..I.F.. 

H. Protein C . .ASLCCGH.T. I . G.G.FS.D.R 

B. Protein C . .DLPCCGR. K.I. GLGGFR.D.A 



* Intron 4 

F GFEG K NCE L DVT CNIKNGR CEQFCKNSAD 

A T A. .S K RDT. 

E FTRK— L.SLD..D .D...-HEEQ 

E F STRE--I.SLD..G .D...-REER 

PA...R...T HKDDQLI.VNE..G ...Y.SDHTG 

D R...T .KQSQLI .AND. .G ...Y.GADPG 

S.W..RF.QR E.SF-LN.SLD. .G .THY.-LEEV 
E.W..RF.LH E.RF-SN.SAE. .G .AHY.-MEEE 



H. 
B. 
H. 
B. 
H. 
B. 
H. 
B. 



IX 
IX 
X 
X 



Factor 
Factor 
Factor 
Factor 
Factor VII 
Factor VII 
Protein C 
Protein C 



105 * Intron 5 

NKVVCSCTEG YRLAENQKSC EPAVP FPCGR VSVSQTSK-LT RAEAVFPDVD YVNSTEAETI 
D D HI..K TI.SNTN .E..S...I. 



•S AR. .T..D.G.A. I.TG.Y...K QTLERRKRSVA Q.TSSSGEAP DSITWKPYDA 

SE.R...AH. .V.GDDS.S. VSTER....K FTQGR SSR W.IHTSEDAL DASEL.HYDP 

T.RS.R.H.. .S.LADGV. . T.T.EY...K IPILEKR 

AGRF.W.H. . .A.QADGV . . A.T.EY...K IP.LEKR 

GWRR...AP. .K.GDDLLQ. H...K PWKRMEK.RSH LKRDTE 

GRRH...AP. ...EDDHQL. VSK.T LGKRMEK.RK. LKRDTN 



Figure B I Factor IX sequences aliened with hurn.in and bovine sequences from rehired coagulation serine proteases factor VI] '.} iagen 
eial. 1986; Takeya et ah 1 9SS ; , facrur X ; Fung e: al. 19S4. 19X.y;,and protein C ; Long et ;i! . 1 9S4; Bcckmann et al. 19S.S). The generic 
and partially generic residues are underlined. 
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Appendix B (continued) 



H. 
B. 
H. 
B. 
H. 
B. 
H. 
B. 



165 



IX 
IX 
X 
X 



Factor 
Factor 
Factor 
Factor 
Factor 
Factor 
Protein C 
Protein C 



*Intron 6 
DFTRVVGG EDAKPGQFPW Q VVL-NGKVDA 



LD NIIQSTQS FN 

U .V. . .N . . .D--E.S ER 

DLDPTENPFDLLDF.Q. .PERG D.--NL..I... QEC.D.EC.. 
DLSPTESSLDLLGL .R . EPSAG EDGSQVV.I. . . R. CAE. EC. 

wit NA SK-PQG.I... KVCPK.EC. 

wit ._ NG SK--PQG. I . . . HVCPK.EC. 

D.E DQ--VVP.LID. KMTRR.DS. 

.VD.K DQ--LVP.I.D. QE.GW.ES. 



.L.-H.EIA. 
AL.I.EENEG 
AL.V.EENEG 



.L 

AM 



,-LVNGAQ 
,-KLNGAL 
, LDS.KKL 
.LDS.KKL 



H. 
B. 
H. 
B. 
H. 
B. 
H. 
B. 



205 *Intron 7 

Factor IX FCGGSIVNEK MIVTAAHC — VE TGVKITVVAG EHNIEETEHT 

Factor IX .77... .V — IK P T.KP.P. 

Factor X ' T.LS.F Y.L — LY QAKRFK.RV. DR.T.QE.GG 

Facto? X T L..F YVL — LH QAKRF . . RV . DR.T.QE.GN 

or II L^TLI.TI .V.S. — FDKIK NWRNLIA.L. ..DLS.HOGD 

Factor VII I TL.GPA .V.S FERLR SRGNL.A.L. ..DLSRV.GP 

Protein C A..AVLIHPS .VL -MD ESK.LL.RL. -JDLRRW.KW 

Protein C V . . AVLIHVS .VL.V. . . — LD SRK.LI .RL. .YDMRRW.SW 



EQKRNVIRH 

A. 

.AVHE.EVV. 
.MAHE.EMTV 
. .S.R.AQV. 
. .E.R.AQ. . 
.LDLDIKEVF 
.VDLDIKEV. 



255 



H. 
B. 
H. 
B. 
H. 
B. 
H. 
B. 



Factor IX 
Factor IX 
Factor X 
Factor X 
Factor VII 
Factor VII 
Protein C 
Protein C 



PHHNYNAAIN 
.Y.S...S.. 
K.NRF--TKE 
K.SRF--VKE 
IPST.--VPG 
VPKQ.--VPG 
V.P..--SKS 
I.P..--TKS 



KYN HDIALL E LDEPLVLNSY VTPICIADKE YTNIFL--KFGS 
- s — 777 . 7..T.E RD S--...Y 



T.DF...V.R .KT.ITFRMN 

T.DF...V.R . KT. IRFRRN 

TT R .HQ.V..TDH 

QTD..V...Q .AQ.VA.GDH 

TTDN H .AQ.AT.SQT IV 

TSDN R .AL.AT.SQT IV 



.A.A.LPERD WAEST.-MTQKT 
. A.A.LPE.D WAEAT.-MTQKT 
.V.L.LPERT FSERT.-AFVRF 
.A.L.LP.PD FADQT.-AFVRF 
LP.SG LAERE.NQAGQE 
LP.SG LSERK.TQVGQE 



H. 
B. 
H . 
B. 
H. 
B. 
H. 
B. 
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IX 
IX 
X 
X 



Factor 
Factor 
Factor 
Factor 
Factor VII 
Factor VII 
Protein C 
Protein C 



GYVSGWG — -RVF HKGRS-ALVLQ YLRVPLVDRA TCL RSTKFTI YNNMFCAGfH 



. I 



I , 



, K . 



NR. 



.SI.. ..K. 

.F. .TH E...Q-STR.K M.E. 

.F. .TH E. . .L-SST.K M.E. 



S. .SH Y. 

Y...N S.K L.SS.I. TQ YD 

Y...S . . K L.SS... TP YD 

SL . QLL DR.AT-..E.M V.N..RLMTQ D. .QQSRKVGDSPN . TEY YS 

SA QLL ER.VT-.RK.M VVL..RLLTQ D. .QQSRQ . PGGPVV TD......YS 

Tl!t!!!yHSS.EK EAK.NRTF..N FIKI.V.PHN E.S EVMSNMV SE..L. 
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The data base beJow lists known point mutations and short 
deletions and additions in the factor IX gene, causing the bleeding 
disorder haemophilia B or Christmas disease (for reviews, see 
Brownlee 1988, Giannelli 1989, Thompson 1990, Green et al 
1991a) These mutations result in a defective clotting factor IX— a 
glycoprotein of 415 amino acid residues normally present in 
plasma and an essential component of the clotting cascade. The 
disease is an X-linked inherited recessive disorder affecting 1 in 
about 30,000 males and only very rarely females. 

The purpose of this database is to update last year's one 
(Giannelli et al, 1991) by collecting in an accessible, summary 
form, molecular data on the causative mutations of haemophilia 
B patients worldwide. It is not intended to replace primary 
publications although it does contain a significant amount of 
unpublished work. As in previous years, we have included repeat 
observations of the same mutation, as well as molecularly unique 
mutations. We have continued our database numbering system 
(Giannelli et al (1991) giving all patients a unique Patient Identity 
Number (PIN or ID number). 

The factor IX gene lies on the long arm of the x chromosome 
at Xq27 and its entire sequence of 33 kb is known (Yoshitake 
et al, 1985). It contains 8 exons (a-h) encoding 6 major domains 
of factor IX. These are: (1) exon a— a hydrophobic signal peptide 
which targets the protein for secretion from the hepatocyte into 
the blood stream. (2) exons b and c— a propeptide and gla 
domain,— the latter containing 12 7-carboxy glutamyl residues. 
This post-translational modification is required for the correct 
folding and calcium binding of factor IX. (3) exon d— a type B, 
or first epidermal growth factor-like domain, which shows 
homology to epidermal growth factor (EGF) and, in addition, 
contains conserved carboxylate residues including a j3- 
hydroxyaspartate at amino acid 64. This domain binds an 
additional Ca 2+ with high affinity (Handford et al, 1991). (4) 
exon e— a type A, or second epidermal growth factor-like (EGF) 
domain which lacks the conserved carboxylate residues of the 



EGF type B domain. (5) exon f— an activation domain, within 
which factor XIa cleaves twice, converting factor IX to IXa; (6) 
exons g and h— the serine protease or catalytic domain, 
responsible for the proteolysis of factor x to Xa. this region is 
homologous to other well studied serine proteases (e.g. 
chymotrypsin) and it is thought likely that his (221), asp (269) 
and ser (365), all participate in the classical catalytic mechanism. 

Factor IX is initially synthesised in the liver as a precursor 
molecule, either 46, 41 or 39 amino acids (it is not known which, 
although 39 is probable (Pang et al, 1990)) longer at its N- 
terminus than the 415-long mature factor IX found in plasma. 
Processing steps occur in the hepatocyte prior to secretion and 
sequentially remove the hydrophobic signal peptide and the 
propeptide. In addition to the 7-carboxylation of the 12 N-terminal 
glutamyl residues carried out by a vitamin K-dependent 
carboxylase, and the partial /5-hydroxylation of aspartate 64, N- 
linked carbohydrate side chains are added at residues 157 and 
167 and at least an O-linked carbohydrate at serine 53. 

There are 574 patient entries in this third edition of the database 
compared with 388 patients last year (Giannelli et al, 1991). 
Besides point mutants, it includes 50 short (defined as less than 
20 nucleotides) deletions or additions or both, made up from 38 
deletions, 9 additions and 3 examples involving both additions 
and deletions. There are also 12 double mutations, 1 triple 
mutation, 10 inhibitor patients and 3 female haemophiliacs. The 
list excludes 29 patients with partial or complete gene deletions 
or more complex rearrangements (Thompson, 1990). Of the 574 
patients studied (see Summary Table), 278 are unique molecular 
events, the remainder being repeats. As is well known, many 
of these repeats occur at CG doublets and involve a CG-TG 
or CA change. As discussed before (Giannelli et al, 1991), such 
mutants are believed to be genuine 'hotspots' for mutation. 
However it is now becoming clear that the high number of repeat 
observations at some CG doublets (e.g. 30 examples at 31,008) 
are caused, at least in part, by a founder effect. A founder effect 



2028 Nucleic Acids Research, Vol 20, Supplement 



is now well established (Thompson et at, 1990; Bottema et al, 
1990b) for a mutation at a residue other than CG's (e.g. 27 
examples at nucleotide 31,311) and there are many examples in 
the database of mutations repeating 2, 3 or occasionally 4 times. 
Most, but probably not all, of these will have a common origin. 

A new feature this year is the inclusion of information in the 
comments section, on new (or de novo) mutations by the UK 
and German coordinators. In addition, we note whether the 
mutation occurred in the mother or in the maternal grandfather 
or grandmother. The German coordinator has also included the 
age of the parent of the child carrying, or affected by, the mutant 
gene. Because only the UK and German coordinators list new 
mutations, and even their data is incomplete, this database cannot 
yet be usefully analysed for their frequency. 

The distribution of mutants according to protein domains and 
control regions within the gene (see Summary Table) shows that 
mutations have been detected in all categories listed except the 
poly(A) site. Remarkably, there are now 11 molecularly unique 
mutants occurring within a short region of the promoter, and these 
are invaluable in studying gene regulation (Crossley & Brownlee, 

1990) . Missense mutations within exons give valuable information 
as to the importance of particular amino acid residues. For 
example, it is reassuring to note that 3 different mutations have 
been discovered at the active site serine (amino acid 365) as well 
as 1 at the proposed active site aspartate (amino acid 269), 
although none is yet known for the proposed active site histidine 
at amino acid 221. Mutations at 6 of the 12 y carboxyglutamyl 
residues have now been detected, confirming their critical role 
for the function of factor IX. 

The second (1991) edition of this data base (Giannelli et al 

1991) is now available from the EMBL file server. It can be 
obtained by sending the command GET HAEMB.DAT to 
NETSERV@EMBL-HEIDELBERG.DE (Internet address). A 
documentation file introducing the format used, which differs 
from this present data base because of restrictions imposed in 
transmitting data by electronic mail, is obtained by sending GET 
HAEMB.DOC to the same address. The format used in the 1991 
File Server version of the data base is somewhat related to that 
used for the familiar EMBL nucleic acid sequence data base and 
should allow easy computer searching for particular features. The 
new entries in this 1992 edition of the data base will be 
reformatted and transferred to the EMBL file server during 1992. 

The data base was compiled from separate lists updating the 
previous year's list prepared by coordinators for the different 
countries as follows:- Giannelli and Green representing the UK, 
Sweden and Iceland; High and Sommer representing USA; 
Lillicrap representing Canada; Ludwig and Olek representing 
Germany; Reitsma representing The Netherlands; Goossens 
representing France; Yoshioka representing Japan; and Brownlee, 
the rest of the world and central coordinator. New data or 
notification of errors or omissions should be sent to the individual 
country coordinators. This database is available from individual 
country coordinators on floppy discs written in Wordperfect 5. 1 
on an IBM PS2 computer. 
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1 Amino acid numbers used (Anson et al, 1984) 

2 Excluding normal variants within double mutants 

3 For numbering, see Yoshitake et al (1985) 

4 These are possible new splice sites within exons 
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